Analyzing Messy Data Sentiment with Python and nltk

Company

Twilio

Date Published

Sept. 6, 2017

Author

Lesley Cordero

Word count

1307

Language

English

Hacker News points

None

URL

www.twilio.com/en-us/blog/sentiment-analysis-python-messy-data-nltk-html

Summary

The text discusses sentiment analysis using Python and the natural language processing module nltk. The author uses a pre-labeled dataset of tweets to build a model that can classify text as positive or negative. The model is trained on the data and then tested on new, unseen data. The results show an accuracy rate of around 83%. However, the author notes that the model's performance is limited by the fact that it does not consider the relationship between words and instead relies solely on word frequencies. This leads to a lack of accuracy when dealing with ambiguous or messy data, such as tweets with typos, abbreviations, or grammatical errors. Despite this limitation, the model demonstrates the potential for sentiment analysis using nltk and Python.