Posts tagged with 'google'

PCPlus 281: Indexing the Internet

I write a monthly column for PCPlus, a computer news-views-n-reviews magazine in the UK (actually there are 13 issues a year — there’s an Xmas issue as well — so it’s a bit more than monthly). The column is called Theory Workshop and appears in the Make It section of the magazine. When I signed up, my editor and the magazine were gracious enough to allow me to reprint the articles here after say a year or so. What I’ll do is publish the article from a year ago or so here when I purchase the current issue.

PCPlus logoI was in England when the May issue came out, so I’m able to post this a little earlier than usual (my Barnes & Noble generally gets an issue 5-6 weeks after it appears in newsagents in England).

This particular piece was a pure layman’s article about how to index text and in particular how big search engines index web pages. I covered the usual suspects: inverted indexes and PageRank, with asides on stemming and SEO (search engine optimization).

As it happens, in doing the research for this article, I read Sergey Brin & Larry Page’s seminal paper The Anatomy of a Large-Scale Hypertextual Web Search Engine for the first time. This was the paper that essentially launched Google and that changed the landscape of search engines. The techniques discussed in this paper have obviously improved in the 12 years since then (I dare say that Google no longer just uses PageRank but instead use a panoply of different indexing mechanisms to improve results), but it is still an excellent exposition of what happens in a large-scale search engine.

And... 12 years ago? How the internet has changed since Brin and Page presented their paper at the Seventh International World-Wide Web Conference in 1998.

This article first appeared in issue 281, May 2009.

You can download the PDF here.

Album cover for HeligolandNow playing:
Massive Attack - Babel
(from Heligoland)


Search

About Me

I'm Julian M Bucknall, the M because it's my middle initial and because I and the other Julian Bucknall (the movie guy) would like to differentiate ourselves.

I'm a programmer by trade, an actor by ambition, and an algorithms guy by osmosis. I write articles for PCPlus in my spare time, not that there's much of that.

Julian M Bucknall Apart from that, an ex-pat Brit, atheist, microbrew enthusiast, Pet Shop Boys fanboy, slide rule and HP calculator collector, amateur photographer, Altoids muncher.

DevExpress

I'm Chief Technology Officer at Developer Express, a software company that writes some great controls and tools for .NET and Delphi. I'm responsible for the technology oversight and vision of the company.

Validation

Validate markup as HTML5 (beta)     Validate CSS

Bottom swirl

Archives

February 2012 (4)
SMTWTFS
« Jan  
1234
567891011
12131415161718
19202122232425
26272829

Like this Archive Calendar widget? Download it here.

Social networking

Google ads

The OUT Campaign

The OUT Campaign

My Tweets

Bottom swirl