Saturday, January 21

google - the next 24/7 challenge

Funny thing with all of these articles coming out discussing the fact that google will not give up it's data and the other three (Yahoo!, AOL and MSN) were comfortable enough to give up the requested data. One thing that I wonder it will break the memescape - is how pervasive google is - to a degree we do not fully recognize.

My years as a robotist (and control theorist) has trained me to look at the world in terms of feedback loops to optimize the performance of any desired outcome. Basic control theory demonstrates that minimizing error as quickly as possible generates the best results. But to do so, one needs to have a way of measuring input, the output, comparing the two values and applying a control signal to the system you are attempting to generate the desired output. In control terms, we call these components the sensor, the controller/actuator and the comparator. In the google realm, how are these articulated?

Output: Choose the highest value link for *your* needs
This is somewhat interesting in the fact that google has two competiting systems - it's search engine and the value of results found on the search section of the results - and then the paid links that exist on the right or in each AdWords box you see on syndicated sites. For this post, I am going to focus on the search engine results alone - since the "control algorithm" for the outputs are different for both.

Search engine results tend to be based on a couple of characteristics. We all know of PageRank where the number of sites that link to a particular site increases the rank of the site it is pointing to - based on a particular concept/keywords. But if you have ever searched for a concept on one computer and then borrowed your friend's computer (on a different network/IP address), you will notice a subtle difference in the search results. The output is particularly tailored to your needs - especially if you have installed the google toolbar and/or have a google account. These components - along with your google cookie - are how you are tracked at google and at other sites. More on this as we go into further into the system architecture.

Input: customer's desire to find the best site at the time of intention
As every person who goes to a search engine, we essentially are looking for what is the best site to meet our need for a particular issue. The challenge is: what I might consider the best site might not be what my brother considers the best site for a concept. Case in point: we both are looking for "Rod Smith". In my case, Senator Rod Smith is important because he is a politician that I am tracking. My brother is a fantasy football player - and Rod Smith of the Broncos is extremely important for his ranking in his league. So, when we both put "Rod Smith" into the google search box (whether on the site, in the toolbar, in the desktop search, etcera) and react to the results - google is able to track the results of our reactions and tailors future relevant output accordingly (e.g. if I put "Jim Davis" into the search box, the Congressman's campaign site is higher on my results listing, where my brother will see cartoonist Jim Davis' autobiography higher). Now, in this small example, tracking mine and my brother's search actions seems like a waste of time - think of the masses of people whom needs to be tracked across the earth. But then, we move to sensors and the sensor's algorithms. An important concept for focus on is data clustering, where large datasets are sliced and diced into consistent groupings.

Additionally, when we are on another page (say the NYTimes.com Editorial page), those google AdWords on the lower right-hand side also are designed to have relevance into your intention. Granted - my intention at the Editorial page might be to see how the NYTimes has positioned their support for a particular candidate, but the links should be enticing to me - to take advantage of my "intention" - even if I have not decided to make the action occur.

Sensors: cookies, toolbar, desktop search, Analytics and AdWords results
In the google system, sensors are essentially the access logs that track your movements across the entirety of the google network. While you might think this network consists of simply the google website and how you act within the website, other sensors are at work all the time.

Initially, you have a cookie on your computer that uniquely identifies you throughout your web surfing experience. So, when you come along to a webpage that has a google AdWords advert, your browser has performed an operation - where your browser requested a Javascript code which passed along your cookie to google, and then pulled the information from google that is relevant to both the page and to you.

Note: this action is almost exactly the issue that DoubleCLICK faced in 2000 - with its efforts to link demographic information to click-trails that it tracked from their own cookie. But, I assume the fact that youa re using google as a search engine, this give google the permission to track you throughout the network without the privacy concerns. (FYI: I support the concept of aggregated information and demographic targeting for effective marketing).

Comparitor: google
The goal here is to make the system (YOU) click on the best link that you want with as high a probability as possible. So, when faced with links, google tracks your action and compares it to what it expected you to do. In control theory, it is called "estimation" where the system it trying to get the error (the difference between the input and the output) to zero as fast as possible.

Controller: google's search and modeling algorithms
In the perview of classical control theory, we spend time working on analyzing the state of the error condition and evaluate the actual, derivative and integral of the signal to drive the plant (YOU) to the best condition the input wants. In modern control theory, there are further refinements that allow for system identification (modeling the plant) through evaluation of the response of the plant to inputs. As the system model becomes more tuned to the plant, the only delay in driving the error to zero is the speed of the plant (e.g. how long does it take for you to click on the desired link). As many a control theorist can tell you, control of non-linear systems (e.g. systems that do not follow elegant, polynomic equations) can be quite an intriguing problem - focusing on other environmental variables that can cause strange control results (e.g. sudden changes in control effort) to manage the non-linearities.

Whew - that sounds far too technical. What does this actually mean? The marketing and sales people will read this discussion (if you have survived this far) as understanding the customer: what makes them tick, what they get excited about, how you will persuade them to listen, buy or act. What these people do is learn how best to read the signals of a person and leverage their own control signals (how they sit, the timber of their voice, the choice of language) to persuade the customer to act. That is what google is doing for us - in the case of understanding our intentions online. As they gain a better understanding of the surfing habits across the entire websphere, they get further knowledge of what are the "non-verbals" of web-travelers.

Is this inappropriate? Is this something that should be protected? If you compare this effort to what the Federal Government is doing to track terrorists via algorithms at the NSA, I can understand why google might be a bit concerned.

1 comment:

Brooke Ferris said...

Sanford! Long time no...well, anything. :) How are you? I have to admit, your most recent blog was a little technical for the likes of a right-brained, non-science/computer oriented person such as myself, but I'm glad you invited me on. Feel free to connect to my blog space as well. I just got back from Ecuador on Sunday and am still recovering.

Take care!
Brooke