Koru is part of an on-going research project to investigate how Wikipedia can be exploited to provide intelligent, intuitive information retrieval. This powerpoint presentation provides a quick overview of the Koru project. If you want more information then read on, or have a look at the following publication:
Milne, D. (2007). A Knowledge-Based Search Engine Powered by Wikipedia.
Submitted to CIKM'07, Lisbon, Portugal.
This paper describes Koru, a new search interface that offers effective domain-independent knowledge-based information retrieval. Koru exhibits an understanding of the topics of both queries and documents. This allows it to (a) expand queries automatically and (b) help guide the user as they evolve their queries interactively. Its understanding is mined from the vast investment of manual effort and judgment that is Wikipedia. We show how this open, constantly evolving encyclopedia can yield inexpensive knowledge structures that are specifically tailored to expose the topics, terminology and semantics of individual document collections. We conducted a detailed user study with 12 participants and 10 topics from the 2005 TREC HARD track, and found that Koru and its underlying knowledge base offers significant advantages over traditional keyword search. It was capable of lending assistance to almost every query issued to it; making their entry more efficient, improving the relevance of the documents they return, and narrowing the gap between expert and novice seekers.
Koru is the Māori word for the newborn, unfurling fern frond; a delicate spiral of expanding fractal shapes. For indigenous New Zealanders it symbolizes growth; rebirth; evolution. Likewise, the Koru topic browsing system provides an environment in which users can progressively work towards the information they seek.
And how will you enquire, Socrates, into that which you do not know? What will you put forth as the subject of your enquiry? And if you find out what you want, how will you ever know that this is the thing that you did not know?
Plato's Meno, 380BC
This question, posed to the Greek philosopher Socrates some 400 years before Christ’s birth, is still relevant in today’s internet-savvy age. Whenever we seek out new knowledge—whenever we turn to the ubiquitous search engines—we must grapple with the same fundamental paradox: how can one describe the unknown? That is exactly what we must do to form a query. To make matters worse, search engines are incapable of reasoning with these descriptions as we would. They instead consider a query as merely an excerpt, a few words or phrases, from within a relevant document. To search effectively, one must predict not only the information a relevant document contains, but also the terms by which this is expressed. In short, one must already know a great deal of what is being sought, in order to find it.
What knowledge seekers need—at least those who are not clairvoyant—is a bridge between what they know and what they wish to know; between their vague initial queries and the concrete topics and terminology available. This is what Koru aims to provide.
How is the knowledge base obtained?
To work well, Koru relies on a large and comprehensive knowledge base which describes the topics, terminology and relations of the information available. Traditionally this would be obtained from manually crafted thesauri, such as WordNet or Agrovoc. Unfortunately these are expensive to produce, not available in many domains, and often not comprehensive enough to provide good coverage of queries.
Instead we use Wikipedia to derive a thesaurus that is specific to each particular document collection. The basic idea is to use Wikipedia's articles as the building blocks of the thesaurus, and its skeleton structure of hyperlinks to determine which blocks we need and how these should fit together. This is the focus of a related project called WikipediaMiner.