Creation of a Corpus of Opinions with Emotions Using Machine Learning
DOI:
https://doi.org/10.33414/rtyc.37.11-23.2020Keywords:
classification of texts, semi-supervised learning, Twitter, text classification, semi-supervised learning, TwitterAbstract
The identification of feelings expressed in textual opinions can be understood as the categorization of them according to their characteristics, and is of great interest today. Supervised learning is one of the most popular methods for textual classification, but a lot of labeled data is needed for training. Semi-supervised learning overcomes this limitation, as it involves working with a small set of labeled data and a larger unlabeled data set. A text classification method was developed that combines both types of learning. Short texts or opinions from the social network Twitter were compiled, to which a series of cleaning and preparation actions were applied, and then classified into four feelings: anger, disgust, sadness and happiness. The precision and recall obtained with the method were satisfactory and as a result, a corpus of messages categorized according to the expressed feeling was obtained.