August 18, 2009

Marissa Mayer talks about the physics of data

Marissa Mayer from Google

Marissa Mayer (image from Wikipedia)

A bunch of us from Kosmix went to hear Marissa Mayer speak at PARC on “Innovation at Google: The physics of data”. We were not alone: the auditorium was packed beyond its capacity. For Google data is everything, and, rightly so, given that it has the coveted and ever-growing “Database of Intentions”. Marissa sees Speed, Scale, and Sensors as the three pillars driving the next quantum leap in this space.

Speed

Marissa explained how the real-time web and services like Twitter, Facebook, and YouTube are creating data at a staggering space. Every time Google crawls the Web, it finds 20% new data, not seen in the last 90 days. Online activity like searching on Google and buying products on Amazon creates data, which is used by applications such as Google Trends, and Amazon Product Recommendations. This real-time data can be applied in creative ways to solve real-world problems. For instance, Google uses search activity to visualize flu trends and can use it to predict epidemics.

Scale

All this data is useless if you can’t analyze it quickly, which is why Marissa puts great emphasis on scale. Google has always had a formidable infrastructural advantage here, both with huge server farms and software innovations such as MapReduce. However, innovations outside Google such as Amazon Web Services and Hadoop also provide great alternatives for companies that need to do some serious number crunching.

Marissa demonstrated various Google products such as Google Squared, Google Public Data, Fusion Tables, Flu Trends, Google Trends, Similar Image Search, Gapminder, and Google Map Maker to illustrate the amazing data visualization systems possible once data is available in a structured format. Google Fusion Tables, for example, combine scientific data to provide a more visual way to analyze health trends like the leading cause of death among children in different regions around the world. Similarly, Google Public Data, which makes government data more accessible, charts statistics like unemployment rates by county. Another tool, Gapminder, lets you build visual correlations between carbon footprint and income per capita. She also stressed upon the importance of user-generated content. For instance, by making map-editing simple, Google Map Maker enabled the community to create detailed maps in countries such as India.

Sensors

As the Web starts to meet the real-word, sensors play an increasingly important role. Your cell phone has eyes, ears, touch, location, orientation, and can connect with others using the web. This is exactly why Android is a strategic bet for a company like Google, which values data immensely.

Love your Data

At every turn, Marissa’s talk emphasized that Data is King. But you don’t have to be Google to use data to make a difference. Every company–big or small, web-based or offline—that generates data should treat it like gold: Save it, analyze it, visualize it, standardize it, share it, and think how you can generate even more.

Abhishek Gattani

abhishek

Leave a Reply