-
Milne, D. (2007). A Knowledge-Based Search Engine Powered by Wikipedia.
Submitted to CIKM'07, Lisbon, Portugal.
(view abstract)
(hide abstract)
(view pdf)
This paper describes Koru, a new search interface that offers effective domain-independent knowledge-based information retrieval. Koru exhibits an understanding of the topics of both queries and documents. This allows it to (a) expand queries automatically and (b) help guide the user as they evolve their queries interactively. Its understanding is mined from the vast investment of manual effort and judgment that is Wikipedia. We show how this open, constantly evolving encyclopedia can yield inexpensive knowledge structures that are specifically tailored to expose the topics, terminology and semantics of individual document collections. We conducted a detailed user study with 12 participants and 10 topics from the 2005 TREC HARD track, and found that Koru and its underlying knowledge base offers significant advantages over traditional keyword search. It was capable of lending assistance to almost every query issued to it; making their entry more efficient, improving the relevance of the documents they return, and narrowing the gap between expert and novice seekers.
-
Witten, I. H., Medelyan, O. and Milne D. (2006). Finding documents and reading them: Semantic metadata extraction, topic browsing and realistic books. Proc. of the RCDL 2006. Suzdal, Russia.
(view abstract)
(hide abstract)
(view pdf)
What would it take to provide a congenial and comfortable environment for finding and reading books in a digital library? To locate information we need algorithms that extract semantic metadata in forms such as keyphrases, with accuracy and consistency comparable to human indexers. To support this we need comprehensive, detailed thesauri, automatically created, that embody contemporary language and usage. To emulate and enjoy the serendipitous adventures found in real libraries and bookstores we need browsing environments that provide readers with multiple clues in parallel: keyphrases, text excerpts, and supplementary knowledge structures?as well as the documents themselves. For readers to cherish and enjoy individual works we need to transcend the bland reading environment provided by the web by recreating the subjective impact and pleasurable experience of interacting with real books. This paper describes research that aims to achieve these goals.
-
Milne, D. (2006) From Phrase Browsing to Interactive Query Expansion, an AJAX enabled approach. Unpublished Masters Thesis
(view abstract)
(hide abstract)
Interactive query expansion covers a group of techniques that provide a useful compromise between searching and browsing. Their case is compelling; they expose available knowledge and assist with the difficult task of constructing effective queries. One such technique is phrase browsing, in which queries are treated as single phrases and evolved by exploring an automatically generated hierarchy of terms. Extensive development and evaluation at the University of Waikato has shown phrase browsing to be promising but lacking in several important respects, particularly its inability to cope with multi-topic queries.
This thesis takes phrase browsing as the starting point and generalizes it into a new interactive query expansion technique. This transcends the narrow definition of phrase browsing by removing its restrictions to provide flexible searching and browsing. Multiple topics can be expanded with terms obtained from different sources, including both general and domain-specific thesauri. These modifications to the general approach are matched by a complete redesign and reimplementation of the interface. The AJAX framework is used to provide a highly responsive web application.
The new system has been compared directly with keyword searching and indirectly with an earlier phrase browser. This formal evaluation confirmed that many phrase browsing problems have been resolved. The new interface was well received by subjects, who preferred it to keyword searching. However, much of the improvement is due to interface features that could be incorporated into the less popular system. This research calls into question the whole idea of phrase browsing and raises the possibility of topic browsing; a more general approach that is less closely tied to specific terminology.