Search Agents


What is the difference between a Search Agent, the topic of this post, and a search engine? A search engine maintains its own index to return search results in a matter of milli-seconds, and assumes that it can fulfill the user’s information needs by itself. There are now many first-, second- and third-level search engines, general or specialised, on the Internet.

A Search Agent takes more time to answer to the user, maybe a few minutes, but yields more varied or less common results. It does so by automatically querying many open search engines and search boxes of important websites, working on the resulting pages and documents with Natural Language Processing (NLP) and Machine Learning (ML) techniques and filtering down the best results to present them to the user. A further difference is that a Search Agent can enter into a dialog with the user, allowing an iterative approach to finding out which of a thousand top results the user is really after.

This possibly sounds like meta-search engines to you, but there are important differences. A meta-search engine is still mostly a search engine, only that it saves the user from visiting all the regular search engines themselves. A Search Agent makes a tradeoff between time and result quality, while not needing its own full-text index. The agent interacts with the Web on behalf of the user, exploiting an intelligence of its own.

Search Agents take advantage of the plethora of queryable online sources on the Internet. There is too much for a single person to work through on their own, which calls for an automatic system. By not maintaining the expensive search index, Search Agents free up resources to help the user with other tasks, like disambiguation of search terms or providing an interactive natural language interface.

There is an explosion of new search engine startups on the Internet, all trying to beat the incumbents at their own game. Modern ML techniques have not been successfully deployed in the search arena, and doing so for an entire search index would be very expensive. A Search Agent product can take a different route: let the search providers do their thing, and add ML capability to the last mile, for filtering down a few thousand search results.

A lot of the technology to make this work is fairly well established: web servers and clients, use of APIs and querying of url endpoints, parsing HTML, etc. But the NLP and ML components are just as important and they are fairly cutting edge technology.

Word sense disambiguation is important, in case the user sees good results swamped out by the same term used in different meanings. The Search Agent can work with the user to specify which word sense is relevant and highlight such results. Text clustering is also important, to present just one specimen of each cluster of results, to give the user a better feel for everything that is out there to be found.

This idea is not entirely new, but it can be re-imagined in the context of today’s Web. With the help of an NLP or ML expert this could be the next big thing in search!