If you use leading search engines a lot, you have probably noticed that they haven’t improved much over the years. They really seem to be stuck in the 2000s, just with more spam than back then. One way in which they could improve, is by using Machine Learning (ML) for processing search results. Some of the companies involved have good ML and AI teams, so why can’t they use that know-how to improve their search engines?
We all know of things online which we consider to be the best of a category. The best YouTube channel on mathematics, the best blog about marketing, the best Twitter about painting, the best website of essays, etc. This idea is for a simple website where users submit and vote on suggestions for the best resource in various categories. While the site focuses on the best and places it prominently, the runner ups would be displayed too.
This is a follow-up post about Search Agents. Here I want to talk a bit more about a possible architecture. The diagram above shows a Search Agent with its peers. To the left, in red, is the browser through which the user submits search queries to the Agent, whose components are shown in green. On the basis of these queries, the web client backend contacts sources on the Internet, such as various search engines, search boxes on social sites and important sources such as Wikipedia.
If you are interested in the island of Java, and type its name into a regular Web search engine, you are returned copious results about the programming language of the same name. This can of course be easily remedied by searching for “Java the island” instead. But if your query involves multiple terms and is more complex, this can become tricky to do manually. Today’s idea is for a component of a search engine or search agent, which helps users resolve the ambiguity in their search terms, before returning matching results.
If you have some experience with different areas of software technology, you have probably noticed it: the same term in different contexts can have many different meanings. An instance in AWS EC2, an instance in MongoDB, an instance in Java and an instance in NginX are all something different. And this applies to other terms as well: objects, buckets, rules, users, domains, processes, etc. can all have various definitions. The idea of this post is to provide a website that collects such terms’ definitions, letting the user search for and list them by domain or by term.
What is the difference between a Search Agent, the topic of this post, and a search engine? A search engine maintains its own index to return search results in a matter of milli-seconds, and assumes that it can fulfill the user’s information needs by itself. There are now many first-, second- and third-level search engines, general or specialised, on the Internet. A Search Agent takes more time to answer to the user, maybe a few minutes, but yields more varied or less common results.