Tuesday, August 25, 2009 at 12:07 AM EDT
Searching for things on the Internet has become such a common activity that it is sometimes hard, even for those of us who have been around for a while, to remember that there was a time Before Google[BG]. The search refinements introduced by Google, and others, have turned what was, in times BG mostly an exercise in frustration, into a genuinely useful tool.
Yesterdayâ€™s New York Times has an article on a new trend in search and data gathering technology, called sentiment analysis, which attempts to extract information about public opinion from, for example, postings on social networking sites:
The Web, with its social networking sites, product reviews (including customer reviews on sites like Amazon), blogs, and other user-driven sites, has become an enormous source of news, opinions, and gossip about products and services. Not surprisingly, the sellers of these products and services are interested in knowing what is being said about them.
Several start-up firms are attempting to develop a business in supplying this kind of information. They claim that their analysis methods and software are able to troll through a vast quantity of Internet postings, and extract summary data about the publicâ€™s reaction to a new product, for example.
This is certainly an interesting idea. To the extent that it provides a way to identify opinion data of interest, it is potentially valuable. For many years, businesses have used clipping services, which employ people to look through published material, such as newspapers and magazines, to find articles relevant to the business. These new services may offer a better or more efficient way to accomplish the same thing,
But Iâ€™m fairly skeptical about some of the broader claims for this technology. In essence, what it is claimed to do is:
This is biting off quite a considerable chunk to chew. The difficulties and ambiguities inherent in processing natural language are fairly well known, and are a key reason why machine translation is a difficult problem. Furthermore, the correct evaluation of feelings or emotions from written material is also notoriously hard: witness the almost universal advice to new users of E-mail to be careful to avoid ambiguities of emotion or â€œtoneâ€. (Think, for example, of how a program might evaluate a review of a portable heater, described by a customer as â€œwicked coolâ€.) One of the firms marketing this technology says it;s technology is not perfect, but is â€œ70 to 80 percent accurateâ€. Even apart from the obvious question, â€œ70 to 80 percent of what?â€, I suspect this is to a certain degree wishful thinking.
Now if this technology, whatever its merits or lack thereof, is just used as a new way to sell things, itâ€™s probably something we can learn to live with. But I think thereâ€™s a potential darker outcome. It seems to me that this kind of technology might have a lot a superficial appeal to the kind of security folks that are keen on wholesale trolling of peoplesâ€™ communications. After all, many of these same folks are quite keen on polygraph tests, despite the fact that a 2003 report from the National Academy of Sciences said that most of the evidence supporting use of the polygraph was â€œunreliable, unscientific, and biasedâ€. We really donâ€™t need any more secret â€œpotential terroristâ€ lists, compiled on the basis of undisclosed evidence.
This article originally appeared on Rich's Random Walks.