NMH: A Precis Engine

Posted on: Sun 09 April 2006

Thinking out loud. I'd like an engine that I could give a URL and would return a precis of web documents related to the URL. Sort of like how link:url works on Google and other search engines, except with more smarts. Technorati is in the right direction for blog posts, at least when it actually works. But it doesn't tell you much more than what links to the post and some notion of related links.

This is hard for the entire web of URLs but would be really useful in some restricted domains. For example, what if you had an engine that simply looked for content related to every piece of legislation in the US House and Senate. Then I just plug in a URL for a particular bill (the engine would have an easily understood way to construct such a URL, e.g. http://example.com/congress/109/house/1234) and get back a page that summarizes related bills, information about the key politicians involved, analysis from government agencies and non-profit institutions, relevant news stories from across the country, and highly relevant content found on the web, including weblogs and other discussion.

Actually, the URLs don't even have to map to documents but relatively obvious concepts, for example stocks.

This precis engine concept relies on focusing the Web to a small, finite set of documents, easily mappable to URLs, and only crawling and searching for information about those documents. The technology is currently well within our reach and reminds me to take a look at what's been happening in the focused crawler community.

Of course, I've sort of reinvented Topix.net, but instead of straight news about a topic, the precis engine would tend towards a Wikipedia page's content, despite using alternative methods to build the encylopedic page.