And the truth (or at least, an attempt at redressing the balance)

The arrogance of Google and their quest to store everything

At the morning conference in the Guardian’s offices last Wednesday we were visited by the Head of Google Europe, Philipp Schindler. He gave an interesting talk about Google’s future plans and current achievements, and was asked some tricky questions from the assembled journalists including issues like Google’s response to government requests for data or their illegal collection of open wifi data while recording StreetView imagery.

Early on in the talk, Schindler aimed to impress by boasting of the scale of the web. He suggested that since the dawn of mankind through to the modern era, we had generated five exabytes of data. He defined an exabyte as “a truck full of millions of books”, but I prefer the more technical definition of “1 billion gigabytes”. He followed this up by telling the room that in the past few days alone, the world has already generated one exabyte.

While I took his point about the volumes of data we produce, it bothered me for several reasons that he seemed to be sharing this data as a kind of achievement. For one, the basic adage of “quality over quantity” is surely applicable here – how much of that data is made up of self-indulgent tweets or spam content farms? Secondly, the sheer informational arrogance of attempting to set a size limit on the sum total of mankind’s data was astounding. How on earth can Google (or anyone else?) attempt to estimate how much mankind has produced before the advent of the internet?

Sure, Google, with their lofty aims to scan every book ever written, probably think they have a good idea about how much of the written word has been published. They also probably think they have a pretty good idea about how many songs were written, films were made, paintings produced and buildings constructed, and can translate these into megabytes and tallies. How much, however, is lost to time? Why do these numbers assume that everything ever published, created, or known is still in existence today, waiting for us to add it to a statistic somewhere?

The idea that we can put a data price on the contribution humans have made to the universe says much of Google’s desire to know everything, and a little about how out of reach that particular goal might just be.