Tuesday, September 16, 2008

Minds obscured by a sea of data

Lordy, what a ride! When I started out in science, the most efficient way we had to look for meaning in data was graph paper. And let me tell you, I was damn good at graphs, something I learned from my father. He loved graphing data. Any data.

Then came calculators. And computers. Toggling in programs on a PDP-8. Then my very own Mac 128, which was obsolete almost as soon as I bought it. Kilo to mega to giga. My new iPod Touch has 16 gigabytes of flash memory, which is more memory that in all the computers on the university campus where I was a graduate student in physics.

We now routinely talk of terabytes, and petabytes are edging into the mainstream. Can exa and zetta and yotta be far behind? Every step a factor of a thousand. Meanwhile, in astronomy, genome analysis, proteomics, and high-energy physics, data is being generated faster than computers can keep up. The Large Hadron Collider will pump out data so fast that entire buildings full of computers will be needed to troll for meaning. Six hundred million collisions per second, year in and year out, spewing out debris. A sharp pencil and a piece of graph paper are as quaint as a monk in a scriptorium with an inkpot and a quill.

The September 4 issue of Nature focuses on dealing with data in the Petabyte Era. Perhaps the most poignant piece is the essay by Sue Nelson on "Pickering's harem," the poorly paid employees of the Harvard College Observatory, who, in the first half of the 20th century, under the direction of astronomer Edward Pickering, searched for meaningful data among hundreds of thousands of astronomical photographic plates -- tedious and often unrewarding work. Among these women were Williamina Fleming, Annie Jump Cannon and Henrietta Swan Leavitt, who made significant breakthroughs in discovering the scale of the universe. History has rescued them from oblivion. Most of what they did would now be done by machines, but their story reminds us that behind the vast complexity of data generation and analysis in the Petabyte Era are human minds in eager interaction with the universe.