Amrita-CEN-Senti-DB:Twitter Dataset for Sentimental Analysis and
Application of Classical Machine Learning and Deep Learning
Abstract
Social media is a platform in which tons and tons of text are
generated each and every day. The data is so large that cannot be easily
understood, so this has paved a path to a new field in the information
technology which is natural language processing. In this paper, the text
data which is used for the classification is tweets that determines the
state of the person according of the sentiments which is positive,
negative and neutral. Emotions are the way of expression of the person’s
feelings which has a high influence on the decision making tasks. Here
we have proposed the text representation, Term Frequency Inverse
Document Frequency (tfidf), Keras embedding along with the machine
learning and deep learning algorithms for the purpose of the
classification of the sentiments, out of which Logistics Regression
machine learning based methods out performs well when the features is
taken in the limited amount as the features increases Support Vector
Machine (SVM) which is also one of the machine learning algorithm out
performs well making a benchmark accuracy for this dataset as the
75.8%. For the research purpose the dataset has been made publically
available.