The future of search
November 30th, 1900
Having read of the recent death of the Irish writer Oscar Wilde, a student at Harvard University is looking for information on his life and his achievements. How does he access this information? He looks at the library catalogue, skims through newspapers and magazines, and writes letters to his colleagues. After a few weeks of research he publishes a tribute to the life and works of Oscar Wilde.
August 5th, 2000
A century later, having read of the recent death of English actor and writer Sir Alec Guinness, a student at Harvard University is looking for information on his life and his achievements. The Wikipedia movement has not yet started and the researcher turns to the recently formed google.com search engine. After a few hours of searching for information, posting on bulletin boards and groups, and browsing websites dedicated to Sir Alec Guinness he is ready with a tribute to the life and works of Sir Alec Guinness.
The year 2100
What will search for information look like in the year 2100? If progress continues at the same rate as the last century, a student in 2100 should be able to go from the thought of writing the essay to the complete essay in a mindboggling 16 seconds. And the rate of progress is increasing.
Even with the best of information, making predictions about the future is a messy business. We can, however, look at the trends and learn from them. What does search look like in 2100 to allow one to go from thought to essay in 16 seconds? I assume that a person will be able to specify his information requirement in some form and a machine will instantly create an answering essay tailored to his or her needs. Instead of presenting a set of links, this answer will be like an automated Wikipedia-like page which will contain not just objective encyclopedic information but also subjective views, statistics, and several other kinds of information. Further, it will be possible for you to specify the extent of information you need, the different aspects of the topic you need covered, the tone of content, the target audience, and several other features that a student would use to make his essay better.
So you can, for example, ask “Give me the list of symptoms of diabetes”, “What is the phone number of my local Wal-Mart?”, “Write a 2 paragraph summary of the Harry Potter series”, “Write a two page essay on the scientific basis of speech in apes as mentioned in the book Congo by Michael Crichton”, “I recently heard about White Holes and want to learn more about the subject and related interesting things”. Current technology can come close to answering the first few questions but it gets harder as the questions get more complex. An ideal information extraction system would not only be able to answer all these questions but will be able to tailor the answers to your needs.
This may sound like a far off dream but we are clearly moving in a direction where a machine will automatically create the perfect article that precisely and completely covers the searched topic.
A search engine of the future
While search engines like Google, Yahoo, and Microsoft Live solve the first few questions above, human created content sites like Wikipedia are trying to do a better job with the later more complex questions by writing the most asked for answers. It is, however, clear that the system of the future will have to automate what Wikipedia is doing and more and do it in several different ways in order to satisfy every user’s need.
Let us try and understand the basic structure of this hypothetical system. On one end we have the users query with some extended specification. On the other end we have an extremely large amount of available content.
The first step this system needs to accomplish is to understand the query better. So we take the user’s question and determine what the subjects this query is interested in are, what the kinds of information that the user wants are, what is the tone of the answering essay, and what is the extent and depth of the returned content. So we know the user wants information on the book Congo and on the scientific basis of speech in apes. We also know he wants a two page essay and is interested in more authoritative scientific sources.
The second step is to take all the available content and understand what its subjects of discussion are, what kinds of information it contains (encyclopedic, user discussions, scientific papers etc.), what kind of audience and tone it is relevant for, etc. So we may find sites that talk about the book Congo, about the speech capabilities of Apes, about Michael Crichton, about Apes in general, etc. We will also be able to say if the information is scientific and has good references, is a discussion with opinions from several people, is a source of images or other media, or is a source of papers or journals etc.
The third step is to connect the subjects in the query with the subjects in the content. When doing this we need to work within the specified level of detail. So the system will make decisions on whether the article should be limited to a short note on speech capabilities of apes and the truth behind the book Congo. Or it could go into more detail on Michael Crichton and his style of writing, the story behind the book Congo, information about Apes in general etc. It will also have to decide if it should stick to scientific articles or if it should delve into opinions, look at videos and documentaries etc.
The final step is to use this information to write an article that is coherent, well organized, and easy to read. The system has to organize the content, the references, and create relevant sections like a human would do. The final article needs to have the right tone, have the right style, be rich in content, and well organized. It also needs to be the right length, starting from a one word answer to a one sentence answer to a multiple page essay.
The search engine of today
Where do we stand as of today? Google, Yahoo, and other search engines are getting better at the one word and the one sentence answers. Wikipedia, About.com, and other editorial content sites are trying to pre-create answers to as many questions as they can. Semantic web, online taxonomies, and other efforts are working on making the content richer so it is easier for machines to understand it. All of these need to come together in completely new ways over the next century to achieve the vision outlined above.
My company, Kosmix.com recently launched a product that I believe is another small step in this journey. Our algorithms are trying to answer the questions which require you to write an essay, a term paper, or explore a topic in detail. They first figure out what subjects the query is interested in. They determine the various intents that the user can have. They look at all the content available on the web to understand the subject of the content, the type of information it represents, etc. The algorithms then make the connections between the various intents of the query and the available content to figure out what the best content for you is. They then organize the chosen content into sections that are meaningful and easy to understand, order the sections with the most relevant content at the top, and summarize the information correctly.
The hope is that Kosmix can present several different perspectives to any topic, can reach those hard to find rare gems for any topic, and can find interesting and surprising relationships for you to explore. In the end we hope that Kosmix can help you answer the really complex questions which require you to explore a topic in detail.
It is a very early and nascent attempt at the technology of the future. We are trying to help you write that essay, explore that topic, or simply browse the web by following interesting and surprising connections. We are clearly not competing with other search engines for the one word and the one sentence answer. Instead, we are trying to help you explore and discover.
How well do we do? I am proud and surprised at how far we have come. Of course, we have a long way to go and each incremental piece of content, improved categorization, and better organization takes us closer to our goal.
Some day we hope to write that essay for you!

Subscribe to our RSS Feed


December 12th, 2008 at 8:51 am
I hope that in 2100 we won’t be limited by the corporeal limitations of our physical bodies, and will access information from a large network or as a hive mind. Such an idea is riddled with skepticism (as was 2001 A Space Odyssey in 1968).
I hope that Kosmix will enable us to explore some of the results connected with the search term – as a network. By clicking on the nodes of the network, we should be able to access various sites/results that have relevant information.