Architecture of a Search Agent

search engines 2022-03-30 343 words 2 minutes

Contents

This is a follow-up post about Search Agents. Here I want to talk a bit more about a possible architecture.

Architecture of a Search Agent

The diagram above shows a Search Agent with its peers. To the left, in red, is the browser through which the user submits search queries to the Agent, whose components are shown in green. On the basis of these queries, the web client backend contacts sources on the Internet, such as various search engines, search boxes on social sites and important sources such as Wikipedia. URLs of pages that are found are then also queried from the Web.

Once most sources have been polled, the middle layer of the Search Agent is applied. A collection of algorithms, which include Machine Learning (ML) and Natural Language Processing (NLP) aspects, is applied to the data in order to find the most relevant information to match the user’s request. The Agent’s frontend, an App server, then returns the results for display in the browser.

The bidirectionality of the arrows indicates that queries flow one way and responses the other way through the Agent. The whole scheme is also an iterative process where more data can be requested by the user or the algorithms at any time.

The table below breaks the functional parts down by modules and components. This is just a first draft of a Search Agent, which would have to be fleshed out during development of the actual thing.

Module	Component	Function
Web client	https client	access web URLs
	HTML parser	parse crawled pages
	XML/JSON parser	parse XML/JSON snippets from APIs
	PDF parser	parse PDF documents
	multi-threading solution	do all of this in parallel

App server	Web server	host client-facing content
	JSON library	en- and decode JSON for website
	Session management	keep track of multiple users
	Email solution	optionally return results per Email

Algorithms	Text Clustering	Don’t return multiple documents of similar content from multiple sources
	Word-sense Disambiguation	Help the user choose more precise meaning of search terms
	Named-entity Extraction	Identify relevant entities in resulting documents
	Question Answering	Optional component for question answering interface