home ¦ Archives ¦ Atom ¦ RSS

Auer et. al: dbpedia

No matter what you think of Wikipedia's quality, it's sort of cool that you can download the entire contents of Wikipedia. That's a whole lot of human generated text, mostly structured, mostly vetted, that motivated hackers can grovel over.

Enterprising German hackers Sören Auer, Chris Bizer, Richard Cyganiak, Jens Lehmann, and Georgi Kobilarov have put together dbpedia:

dbpedia.org is a community effort to extract structured information from Wikipedia and to make this information available on the Web. dbpedia allows you to ask sophisticated queries against Wikipedia and to link other datasets on the Web to Wikipedia data.

Basically they've extracted a couple of decent sized datasets of well structured information nodes from Wikipedia, e.g. music albums, city entries, and put Semantic Web search on top. You get some pretty powerful query capabilities out of this.

I don't hang with this crowd, but seems to me that Wikipedia snapshots would make great grist for web mining and text mining folks.

© Brian M. Dennis. Built using Pelican. Theme by Giulio Fidente on github.