Perhaps time difference...: Methods for sentiment classification

Nowadays, e-shopping is no longer a strange concept to most people. Through the Internet, what we need to do is just to click the mouse to choose the commodity we want, saving our time and energy greatly. When we make our decisions, apart from our personal factors, other people’s comments may also influence our choice.

Not only in e-shopping, but also in many other areas, the comments associated with the entity play a more and more important role in our daily life. Thus, making a statistics would be meaningful to divide these comments into three classes including positive, neutral and negative ones.

There are many methods to realize this function, while one of them is dictionary-based. In this method, each word will be given a score depending on the distance from three polarities which is recorded in the dictionary set up in advance. Even though it maybe easy to deal with a single word, however also has its own limitations. One is that can not operate words not included in the dictionary, so great deal of time is required to build and update the dictionary.

There is another method called supervised learning which can work more efficiently if we have some documents which we have known their sentiment class already. Using these documents, we can train a classifier to simplify our latter work. When a new word coming, it would get a score in the classifier and be dropped into corresponding class according to the thresholds. In this way, new words can be classified and the classifier will be updated at the same time. Meantime, the problem of this method is its low ability to resist error. When the meaning of a word fuzzy, it is much easier for classifier to give it a score near to the threshold which would probably lead to a wrong classification. This would introduce error to the system and the error accumulation would happen during updating, making classifier work worse and worse as time goes by.

figure1 classify a new word

figure2 error happens

figure3 error accumulation

Therefore,we should choose different methods due to corresponding situation to control the workload and efficiency.

4 則留言:

324324234432014年10月15日上午8:18
Hi, you really have deep understanding on this topic, I was wondering could you give more information on what kind of methods can reduce the error to a lower level? thx
回覆刪除
回覆
Unknown2014年11月17日上午1:17
Zhang，your explanation of supervised learning inspires me.And I want to know more about this tool.Could you give me some references?
回覆刪除
回覆
Unknown2014年11月21日上午7:21
You explain two method of sentiment classification, dictionary-based and supervised learning clearly. And you stated the error accumulation problem. I wonder how to solve this problem?
回覆刪除
回覆
Unknown2014年11月21日上午7:30
Thanks for your sharing. I can know that you have deeply understand of the method your have mentioned for sentiment classification. The graphs your have used help me a lot in understanding the methods.
回覆刪除
回覆

新增留言

2014年9月30日星期二

Methods for sentiment classification

4 則留言: