Contents

Architecture of a Search Agent

Contents

This is a follow-up post about Search Agents. Here I want to talk a bit more about a possible architecture.

Architecture of a Search Agent

The diagram above shows a Search Agent with its peers. To the left, in red, is the browser through which the user submits search queries to the Agent, whose components are shown in green. On the basis of these queries, the web client backend contacts sources on the Internet, such as various search engines, search boxes on social sites and important sources such as Wikipedia. URLs of pages that are found are then also queried from the Web.

Once most sources have been polled, the middle layer of the Search Agent is applied. A collection of algorithms, which include Machine Learning (ML) and Natural Language Processing (NLP) aspects, is applied to the data in order to find the most relevant information to match the user’s request. The Agent’s frontend, an App server, then returns the results for display in the browser.

The bidirectionality of the arrows indicates that queries flow one way and responses the other way through the Agent. The whole scheme is also an iterative process where more data can be requested by the user or the algorithms at any time.

The table below breaks the functional parts down by modules and components. This is just a first draft of a Search Agent, which would have to be fleshed out during development of the actual thing.

 

Module Component Function
Web client https client access web URLs
HTML parser parse crawled pages
XML/JSON parser parse XML/JSON snippets from APIs
PDF parser parse PDF documents
multi-threading solution do all of this in parallel
 
App server Web server host client-facing content
JSON library en- and decode JSON for website
Session management keep track of multiple users
Email solution optionally return results per Email
 
Algorithms Text Clustering Don’t return multiple documents of similar content from multiple sources
Word-sense Disambiguation Help the user choose more precise meaning of search terms
Named-entity Extraction Identify relevant entities in resulting documents
Question Answering Optional component for question answering interface