Nowadays, e-shopping is no longer a strange
concept to most people. Through the Internet, what we need to do is just to
click the mouse to choose the commodity we want, saving our time and energy
greatly. When we make our decisions, apart from our personal factors, other
people’s comments may also influence our choice.
Not only in e-shopping, but also in many other areas, the comments
associated with the entity play a more and more important role in our daily
life. Thus, making a statistics would be meaningful to divide these comments
into three classes including positive, neutral and negative ones.
There are many methods to realize this function, while one of them is
dictionary-based. In this method, each word will be given a score depending on
the distance from three polarities which is recorded in the dictionary set up in
advance. Even though it maybe easy to deal with a single word, however also has
its own limitations. One is that can not operate words not included in the
dictionary, so great deal of time is required to build and update the
dictionary.
There is another method called supervised learning which can work more
efficiently if we have some documents which we have known their sentiment class
already. Using these documents, we can train a classifier to simplify our
latter work. When a new word coming, it would get a score in the classifier and
be dropped into corresponding class according to the thresholds. In this way,
new words can be classified and the classifier will be updated at the same
time. Meantime, the problem of this method is its low ability to resist error. When
the meaning of a word fuzzy, it is much easier for classifier to give it a
score near to the threshold which would probably lead to a wrong
classification. This would introduce error to the system and the error accumulation
would happen during updating, making classifier work worse and worse as time
goes by.
figure1 classify a new word
figure2 error happens
figure3 error accumulation
Therefore,we should choose different methods due to corresponding situation to control the workload and efficiency.



Hi, you really have deep understanding on this topic, I was wondering could you give more information on what kind of methods can reduce the error to a lower level? thx
回覆刪除Zhang,your explanation of supervised learning inspires me.And I want to know more about this tool.Could you give me some references?
回覆刪除You explain two method of sentiment classification, dictionary-based and supervised learning clearly. And you stated the error accumulation problem. I wonder how to solve this problem?
回覆刪除Thanks for your sharing. I can know that you have deeply understand of the method your have mentioned for sentiment classification. The graphs your have used help me a lot in understanding the methods.
回覆刪除