Kniles

Adding semantic hypertext links to digital library collections

Keywords: automated hypertext generation, keyphrase extraction, information retrieval, digital libraries.

Collections: Computer Science Technical Reports, HCI Bibliography.


Kniles is a web-based system for inserting topic-based hypertext links into existing, large-scale digital library collections. These links let you browse collections of documents that, for one reason or another, do not already have embedded links.

For example, the Computer Science Technical Report (CSTR) collection of the New Zealand Digital Library contains approximately 40000 papers; and because all were converted from postscript format, none conatin hypertext links. We have enriched this collection with Kniles, allowing you to browse from topic to topic.

Here are some example topics and documents:

The hypertext links inserted by Kniles are based on author keywords, and on keyphrases automatically extracted by Kea. Keyphrases make good link anchors because they are succinct topic descriptions and appear frequently in the text. They are useful for selecting link destinations becuase they are chosen to characterise the document to which they have been assigned.

Kniles is implemented using CGI scripts and standard HTML. This has the advantage of accessability, as anyone with a web browser and Internet access can browse the collection. However, HTML has several limitations that prevent us from exploiting the keyphrase data to the fullest. We have implemented a more sophisticated interface in Tcl/Tk, which we call Phrasier.


Overview

Here's an overview of Kniles & Phrasier, taken from a recent paper:

Many digital libraries are comprised of documents from disparate sources which are independent of the rest of the collection. The digital patron's ability to explore is severely curtailed when each document stands in isolation; there is no way to navigate to other, related, documents, or even to tell if such exist. We describe a method for automatically introducing topic-based links into documents to support browsing in digital libraries. Automatic keyphrase extraction is exploited to identify link anchors, and keyphrase-based similarity measures are used to select and rank destinations. Two implementations are described: one that applies these techniques to existing WWW-based digital library collections using standard HTML, and one that uses a wider range of interface techniques to provide more sophisticated linking capabilities. An evaluation shows keyphrase-based similarity measures work as well as a popular full-text retrieval system for finding relevant destination documents.


Contact Gordon Paynter ([email protected]) for more information.