Sanford Dickert, Social Engineer

General thoughts on being involved in the intersection of business and science, technology and marketing, private industry and public service (and occasional opinions on movies and other entertainments)

Saturday, May 30

Hillary Mason at BarCampNYC4

Great talk on data scrubbing: Have Data? What now?!

Open Calais (www.opencalais.com) and Freebase as data analysis tools.

Entity disambiguity - ability to discern which goes with what (Hillary shows Cuil's search results on herself)
Company disambiguation - often handled by humans
At Path101 - human, Data APIs (e.g. mTurk) or auto-classification

Shows her google spam box: how does google check it out
eScienceNews uses a vector analysis and hierarchical clustering model (to figure out what is interesting) then uses baysian document classification model. (www.esciencenews.com/about.html)

Discussing clustering and hierarchical clustering and how it is applied to Path 101.
Depending on your algorithm, you need to choose your algorithm - lots of rules of thumb, but the artistry is on knowing how to tune the groups/clusters to the algorithm.

Labels: , , ,

0 Comments:

Post a Comment

<< Home