home ¦ Archives ¦ Atom ¦ RSS

Pilgrim: Encoding Detector

Total geek post forthcoming. I've been doing a lot of webfeed crawling and parsing development recently. Due to the heinous abuse of character encoding declarations out there, I had to rip off some encoding detection Python code from Mark Pilgrim's Universal Feed Parser. (Don't worry it's legal.) While grinding my teeth, I was wishing there was a way to just hand off a string to some function and have it guess what the encoding was.

Oops. Problem solved.

© Brian M. Dennis. Built using Pelican. Theme by Giulio Fidente on github.