This is a follow-up post about Search Agents. Here I want to talk a bit more about a possible architecture.
The diagram above shows a Search Agent with its peers. To the left, in red, is the browser through which the user submits search queries to the Agent, whose components are shown in green. On the basis of these queries, the web client backend contacts sources on the Internet, such as various search engines, search boxes on social sites and important sources such as Wikipedia. URLs of pages that are found are then also queried from the Web.
Once most sources have been polled, the middle layer of the Search Agent is applied. A collection of algorithms, which include Machine Learning (ML) and Natural Language Processing (NLP) aspects, is applied to the data in order to find the most relevant information to match the user’s request. The Agent’s frontend, an App server, then returns the results for display in the browser.
The bidirectionality of the arrows indicates that queries flow one way and responses the other way through the Agent. The whole scheme is also an iterative process where more data can be requested by the user or the algorithms at any time.
The table below breaks the functional parts down by modules and components. This is just a first draft of a Search Agent, which would have to be fleshed out during development of the actual thing.
Module | Component | Function |
---|---|---|
Web client | https client | access web URLs |
HTML parser | parse crawled pages | |
XML/JSON parser | parse XML/JSON snippets from APIs | |
PDF parser | parse PDF documents | |
multi-threading solution | do all of this in parallel | |
App server | Web server | host client-facing content |
JSON library | en- and decode JSON for website | |
Session management | keep track of multiple users | |
Email solution | optionally return results per Email | |
Algorithms | Text Clustering | Don’t return multiple documents of similar content from multiple sources |
Word-sense Disambiguation | Help the user choose more precise meaning of search terms | |
Named-entity Extraction | Identify relevant entities in resulting documents | |
Question Answering | Optional component for question answering interface |