Nowadays, e-shopping is no longer a strange
concept to most people. Through the Internet, what we need to do is just to
click the mouse to choose the commodity we want, saving our time and energy
greatly. When we make our decisions, apart from our personal factors, other
people’s comments may also influence our choice.
Not only in e-shopping, but also in many other areas, the comments
associated with the entity play a more and more important role in our daily
life. Thus, making a statistics would be meaningful to divide these comments
into three classes including positive, neutral and negative ones.
There are many methods to realize this function, while one of them is
dictionary-based. In this method, each word will be given a score depending on
the distance from three polarities which is recorded in the dictionary set up in
advance. Even though it maybe easy to deal with a single word, however also has
its own limitations. One is that can not operate words not included in the
dictionary, so great deal of time is required to build and update the
dictionary.
There is another method called supervised learning which can work more
efficiently if we have some documents which we have known their sentiment class
already. Using these documents, we can train a classifier to simplify our
latter work. When a new word coming, it would get a score in the classifier and
be dropped into corresponding class according to the thresholds. In this way,
new words can be classified and the classifier will be updated at the same
time. Meantime, the problem of this method is its low ability to resist error. When
the meaning of a word fuzzy, it is much easier for classifier to give it a
score near to the threshold which would probably lead to a wrong
classification. This would introduce error to the system and the error accumulation
would happen during updating, making classifier work worse and worse as time
goes by.
figure1 classify a new word
figure2 error happens
figure3 error accumulation
Therefore,we should choose different methods due to corresponding situation to control the workload and efficiency.


