2014年9月30日星期二

Methods for sentiment classification

      Nowadays, e-shopping is no longer a strange concept to most people. Through the Internet, what we need to do is just to click the mouse to choose the commodity we want, saving our time and energy greatly. When we make our decisions, apart from our personal factors, other people’s comments may also influence our choice.
     Not only in e-shopping, but also in many other areas, the comments associated with the entity play a more and more important role in our daily life. Thus, making a statistics would be meaningful to divide these comments into three classes including positive, neutral and negative ones.
    There are many methods to realize this function, while one of them is dictionary-based. In this method, each word will be given a score depending on the distance from three polarities which is recorded in the dictionary set up in advance. Even though it maybe easy to deal with a single word, however also has its own limitations. One is that can not operate words not included in the dictionary, so great deal of time is required to build and update the dictionary.
    There is another method called supervised learning which can work more efficiently if we have some documents which we have known their sentiment class already. Using these documents, we can train a classifier to simplify our latter work. When a new word coming, it would get a score in the classifier and be dropped into corresponding class according to the thresholds. In this way, new words can be classified and the classifier will be updated at the same time. Meantime, the problem of this method is its low ability to resist error. When the meaning of a word fuzzy, it is much easier for classifier to give it a score near to the threshold which would probably lead to a wrong classification. This would introduce error to the system and the error accumulation would happen during updating, making classifier work worse and worse as time goes by.
figure1 classify a new word
figure2 error happens
figure3 error accumulation

    Therefore,we should choose different methods due to corresponding situation to control the workload and efficiency.   

4 則留言:

  1. Hi, you really have deep understanding on this topic, I was wondering could you give more information on what kind of methods can reduce the error to a lower level? thx

    回覆刪除
  2. Zhang,your explanation of supervised learning inspires me.And I want to know more about this tool.Could you give me some references?

    回覆刪除
  3. You explain two method of sentiment classification, dictionary-based and supervised learning clearly. And you stated the error accumulation problem. I wonder how to solve this problem?

    回覆刪除
  4. Thanks for your sharing. I can know that you have deeply understand of the method your have mentioned for sentiment classification. The graphs your have used help me a lot in understanding the methods.

    回覆刪除