The wikifier system was mainly tested on Wikipedia articles, by taking the links out and trying to put them back in automatically. In addition, the system was also tested on news stories from the AQUAINT corpus, to see if it would work as well "in the wild" as it did on Wikipedia. The stories were automatically wikified, and then inspected by human evaluators. We have shared the news stories here, as an extra dataset for others to evaluate their systems.

You can download the stories as a directory of xml files. These files only contain links that were manually created, or automatically created and manually checked. In other words, they only contain manually-verified links and can be used as ground truth.

You can also browse the stories below. These pages contain good and bad links, to show how the system behaved.

APW19980603_0791 APW19980824_0827 APW19981109_1172 APW19980603_1617
APW19980903_1073 APW19981113_0500 APW19980604_0787 APW19980917_0818
APW19981113_0729 APW19980610_0111 APW19980930_0284 APW19981119_0585
APW19980611_0774 APW19980930_0522 APW19981120_1056 APW19980614_0031
APW19981001_0866 APW19981130_0743 APW19980615_0417 APW19981010_0354
APW19981210_0433 APW19980620_0458 APW19981020_1367 APW19981215_1083
APW19980624_0436 APW19981022_0630 APW19990120_0179 APW19980624_0607
APW19981022_0710 APW19990203_0315 APW19980625_1136 APW19981026_0096
APW19990519_0141 APW19980627_0596 APW19981106_0920 APW19990526_0131
APW19980709_0263 APW19981109_0140 APW19990827_0137 APW19980713_0449
APW19981109_0152 APW19990827_0184 APW19980808_0196 APW19981109_0440
APW20000303_0067 APW19980811_0512 APW19981109_0464 APW20000312_0050
APW19980816_0994 APW19981109_1089