Monday, February 6

Better targeting through clustering

Ever since my post on google/Yahoo!/Microsoft - I have had a number of conversations about how targeted google is getting. In the past few weeks:
  1. google launched Personalized Search - now, when I am surfing the web - and my google account is logged in, google is now tracking my search history overtly. No more hiding behind cookies, I am now identified as my google login. Consider the fact that gmail has "millions of users" as quoted in a New York Time article today, and the growth of the authentication platform is across the entire googlescape, that means that google has information on many millions of potential users.
  2. In that article, google just announced Gmail Chat - a combination of Gmail and IM client software - meaning now, google has information on your IM conversations - an additional source of rich information on your personal habits.
  3. google's answer to Paypal - GBuy - will now tag products with a google GBuy logo - which you will be able to benefit from.
So consider the sources of information google has available to it's data stores (and this is not exhaustive):
  1. Your search history - via the google homepage, the google search bar on your browser or any google-affiliated search box on other websites. And with Personalized Search enabled, your cookies are not the only thing identifying your actions
  2. Your desktop search and information - what are the key topics you are searching for locally on your hard drive or on any network drive you have connected? google Desktop indexes the information on your desktop - and crushes the index into a fast hash - both for you and themselves
  3. AdWord-supported websites - which sites have you visited that are currently syndicating AdWords - with each site you go to, google gets a request from your browser to refresh the adverts for the page you pull
  4. google Analytics websites - now, with Urchin as a free analytic web tracking service offered by google - some of the better websites are incorporating analytics - even without google AdWords. But, with the targeting solution within Analytics, it makes using AdWords that much easier
Consider the amount of data google must have to capture and collect on all of the searches in the world, all of the actions in the world. One of the scariest things I ever experienced (granted, at no personal threat to my life) was when I decuded to use my treo 650 to search the web. I accessed google's Palm page (quite nice) and noticed the Personalized Home link. I clicked on it - thinking I would have to log in. Surprising to me - it suddenly had my profile loaded in the ig service - and all of my Personalized widgets were there. All without making a sign-in. How can google do this?

Clustery-goodness
In my first year at Stanford, I discovered that programming - while fun - could be given to better programmers than I. I have always been pretty good at programming, but one discovered that (at Stanford) you can always learn from others. In my AI/Lisp programming course, we were given a task on filling in crossword puzzle matrices using the Unix dictionary. The goal was to create an algorithm to quickly fill in the blanks and submit complete crossword puzzles. Our solution turned out to be a primitve form of "clustering" and a fast search algorithm using a bitwise representation of the dictionary.

Why this walk down memory lane? Consider the problem of tracking the surfing characteristics of the entire World Wide Web. Do you think google really cares about you in particular? As in, what you - Joe Block of 123 Main Street, in Cedar Rapids - really do? Unless you are a googel account holder, not particularly. The goal is to make AdWords and search relevant to you - or at least people like you. So, how does this translate?

Consider the ever popular Myers-Briggs Personality Indicator Test. In it, people answer questions (give information about themselves) and out pops a categorization of yourself - along four dimensions (Extrovert/Introvert, Sensate/Intuitive, Thinking/Feeling, Judgemental/Perceptive). 16 categories, four dimensions - which is 2 to the 4th power (binary!). Now, consider the fact that google has the capacity to handle many more dimensions - on the order of hundreds. What if you were able to determine the major categories (e.g. location, time of day, major topics of the world, frequent requests) and map them into 100,000 categories. Do you think that google might be able to provide relatively targeted information - in aggregate?

Now - include the google account holders. Now - google has a definite tracking of all of the google account holders, with permission to help their searches. Even better - they can now be certain of the clusters necessary for these holders. Now - consider how many people have searching habits like me - across the world. And I now make google's life easier by being a source of demographic, psychographic and click-graphic information to help connect those who are not on the system get better results.

The art of database marketing - with the world's biggest computing system. Any reason why google is worth its market share?

Tags: clustering google gbuy gmail chat

No comments: