Brin & Page - The Anatomy of a Large-Scale Hypertextual Web Search Engine
This is the rock star of academic papers. This is what every paper hopes that it becomes. This isn't just a paper we can learn from.
This is history.
Whether you like Google or not, very few companies of its size can so clearly point to their beginning in a paper like this. Page and Brin discuss what have become the foundations for Google and web search. They talk about PageRank and how it uses a relatively simple algorithm to determine a page's importance by tracking the pages that link to it. There is then a review of related work that led them to PageRank. They then go on to discuss Google's architecture for storing and crawling the data and the lexicon. There is then a section of results and then conclusions. The last sentence is funny to look back on: "We hope Google will be a resource for searchers and researchers all around the world and will spark the next generation of search engine technology." If it had just been that.
Kleinberg - Authoritative Sources in a Hyperlinked Environment
This paper discusses web search. It would seem to come to the same conclusions as Brin & Page but without the billion dollar company: We can find good information on the web by looking to see who links to whom, but in this case this paper advocates for the use of authoritative papers.
IIR Chapter 19
This might be the first chapter in this book that actually has some practical stuff in it. There is brief discussion of the history of web search. Then it goes on to talk about the model of the Web and how it helped it grow so explosively. The web looks suspiciously like a graph. Then the paper goes on to talk about spam and how it was inevitable because "spam stems from the heterogeneity of motives in content creation on the Web." The chapter goes on to talk about the users that use search. It finishes up this chapter with dealing with duplicates.
IIR Chapter 20
This chapter deals with the structure of the web as dealt with briefly in chapter 19. This appears to be mostly a rehash of the two papers we read for this week .
No comments:
Post a Comment