home ¦ Archives ¦ Atom ¦ RSS

Bergmark: Collection Building

My encounter with Nalanda hints at a focused crawling effort within the digital libraries community I haven't examined deeply enough. For example, Cornell's Donna Bergmark ran an extended collection building project that used the classic Mercator web crawler. Collection building aims to automatically build high quality, topic specific portals for online libraries. The group generated some interesting results, including a literature review of collection building (PDF).

As an addendum, Heritrix looks like the open source succesor to Mercator, possibly the first openly, documented high performance web crawler.

© Brian M. Dennis. Built using Pelican. Theme by Giulio Fidente on github.