Search for a command to run...
Sentiment Analysis (SA) research has gained tremendous momentum in recent times. However, there has been little work in this area for an Indian language. We pro-pose in this paper a fall-back strategy to do sentiment analysis for Hindi documents, a problem on which, to the best of our knowl-edge, no work has been done until now. (A) First of all, we study three approaches to perform SA in Hindi. We have devel-oped a sentiment annotated corpora in the Hindi movie review domain. The first of our approaches involves training a classifier on this annotated Hindi corpus and using it to classify a new Hindi document. (B) In the second approach, we translate the given document into English and use a classifier trained on standard English movie reviews to classify the document. (C) In the third approach, we develop a lexical resource called Hindi-SentiWordNet (H-SWN) and implement a majority score based strategy to classify the given document. A comparison of performance of these ap-proaches implies that we can adopt a fall-back strategy for doing sentiment analysis for a new language, viz., (1) Train a senti-ment classifier on in-language labeled cor-pus and use this classifier to classify a new document. (2) If in-language training data is not available, apply rough machine trans-lation to translate the new document into a resource-rich language like English and detect the polarity of the translated docu-ment using a classifier for English, assum-ing polarity is not lost in translation. (3) If the translation cannot be done, put in place a SentiWordNet-like resource for the new language and apply a majority strat-egy to the document to be classified. Two additional contributions of our work are (i) the development of sentiment labeled cor-pus for Hindi movie reviews and (ii) con-struction of a lexical resource, Hindi Senti-WordNet based on its English counterpart. 1