Amrita-CEN-Senti-DB:Twitter Dataset for Sentimental Analysis and Application of Classical Machine Learning and Deep Learning
Preprints are manuscripts made publicly available before they have been submitted for formal peer review and publication. They might contain new research findings or data. Preprints can be a draft or final version of an author's research but must not have been accepted for publication at the time of submission.
Social media is a platform in which tons and tons of text are generated each and every day. The data is so large that cannot be easily understood, so this has paved a path to a new field in the information technology which is natural language processing. In this paper, the text data which is used for the classification is tweets that determines the state of the person according of the sentiments which is positive, negative and neutral. Emotions are the way of expression of the person’s feelings which has a high influence on the decision making tasks. Here we have proposed the text representation, Term Frequency Inverse Document Frequency (tfidf), Keras embedding along with the machine learning and deep learning algorithms for the purpose of the classification of the sentiments, out of which Logistics Regression machine learning based methods out performs well when the features is taken in the limited amount as the features increases Support Vector Machine (SVM) which is also one of the machine learning algorithm out performs well making a benchmark accuracy for this dataset as the 75.8%. For the research purpose the dataset has been made publically available.