Soumen Chakrabarti's book, Mining the Web: Analysis of Hypertext and Semi Structured Data should be required reading for anyone who claims to be doing any sort of large scale Web page/site analysis. Especially all those folks trying to do bad PageRank knockoffs.