Abstract:
Sentiment analysis, a subfield of Natural Language Processing, has garnered a great deal of attention within the research community. To date, numerous sentiment analysis approaches have been adopted and developed by researchers to suit a variety of application scenarios. This consistent adaptation has allowed for the optimal extraction of the authors emotional intent within text. A contributing factor to the growth in application scenarios is the mass adoption of social media platforms and the bondless topics of discussion they hold. For government, organizations and other miscellaneous parties, these opinions hold vital insight into public mindset, welfare, and intent. Successful utilization of these insights could lead to better methods of addressing said public, and in turn, could improve the overall state of public well-being. In this study, a framework using a hybrid sentiment analysis approach was developed. Various amalgamations were created – consisting of a simplified version of the Valence Aware Dictionary and sEntiment Reasoner (VADER) lexicon and multiple instances of classical machine learning algorithms. In this study, a total of 67,585 public opinion-oriented Tweets created in 2020 applicable to the South African (ZA) domain were analyzed. The developed hybrid sentiment analysis approaches were compared against one another using well known performance metrics. The results concluded that the hybrid approach of the simplified VADER lexicon and the Medium Gaussian Support Vector Machine (MGSVM) algorithm outperformed the other seven hybrid algorithms. The Twitter dataset utilized serves to demonstrate model capability, specifically within the ZA context.