Sentiment Analysis

By Mat Morrison April 18th, 2007
In Buzz & sentiment analysis · Stories

Bloggers are ambivalent about Windows Vista

Windows Vista

Bloggers really like Marmite

marmite

Bloggers love the iPhone

iPhone

I’m into Sentiment Analysis. There’s a very good white paper explaining the theory and practice of this over at Corpora - but you’ll have to register to get hold of it. I’m not sure about the ethics and practicalities of this: I understand that - for B2B businesses - white papers are an important tool to get people into the top of the sales funnel. But I also think that limiting access limits distribution. How many of you reading this are going to bother signing up on a new site, just to read a white paper? And yet, I think it would be wrong of me to re-publish the paper. It is Copyright, and not Attribution-No Derivs, after all.

So - what I’m going to do is summarize a little.


1) Sentiment analysis uses one or more methods to assess whether a piece of textual content is predominantly positive, negative, or neutral

2) One method is lexicon-based: it’s possible to parse text quite easily these days (for the more geeky among you, Helmut Schmid’s Tree Tagger works on Windows, Linux and Mac). Combining this technology with a lexicon that supplies a “sentiment value” for each word (”horrid” is more negative than “nasty”) lets us assess and score each relevant word in a text to create a cumulative score. For more information on an adjective-based approach to sentiment analysis, see Hatzivassiloglou and McKeown
“Predicting the Semantic Orientation of Adjectives” (1997).

The downside, of course, is that in different kinds of register and discourse, the weighting of words changes quite radically. How do we recognize and address irony or sarcasm? How do we address words like “wicked” or “bad” (I’m afraid my age is showing here.)

3) Another method is based on machine-learning: a large selection of sample documents are manually assessed and coded and used to train a machine. The machine then starts making guesses about new documents - a human editor monitors its findings, and makes corrections. Gradually, the machine “learns” to recognize sentiment.

This kind of AI technology already underpins some trainable spam-filters, data-mining products etc. It can take a lot of time and resources to train a machine - and a new machine will probably need training for each issue and discourse.

Furthermore, the framework for “learning” that is set up at the beginning of the project will place a top limit on eventual accuracy. AI tools are far from perfect.

In the end, there remain many pitfalls. A quick glance at the free tool Opinmind (which supplied the “sentimeters” at the top of this story) should give you a clear idea of some of these (it should be said that the commercially-available tools are currently rather better than this.) Opinmind uses a lexicon-based approach

0 responses so far ↓

  • There are no comments yet...

Leave a Comment