From text annotation to an auto-regressive language model for sentiment analysis in South African financial reviews

dc.contributor.advisorMarivate, Vukosi
dc.contributor.coadvisorModupe, Abiodun
dc.contributor.emailmiehleketo.mathebula@tuks.co.zaen_US
dc.contributor.postgraduateMathebula, Miehleketo
dc.date.accessioned2025-02-27T10:48:01Z
dc.date.available2025-02-27T10:48:01Z
dc.date.created2025-05
dc.date.issued2024-11
dc.descriptionDissertation (MSc (Computer Science))--University of Pretoria, 2024.en_US
dc.description.abstractIn contemporary society, social media enables rapid expression of public sentiment toward governmental policies and financial products. This immediacy and depth of sharing can serve as a virtual focus group for major financial decisions, offering a gold mine for understanding customer satisfaction and identifying new product features and services. Customer reviews are crucial for the profits and reputations of financial institutions. SA assesses customer feedback and media headlines to gauge sentiment but faces challenges with the brevity, abbreviations, and financial terminologies in social media content. Earlier studies used human-annotated text to create LBMs for training MLAs in SA. However, these models lacked robustness and failed to capture the full range of natural language semantics. Our research used advanced natural language processing to address this gap, gathering customer reviews from Hellopeter and financial data from the top five JSE-listed financial institutions in South Africa. We employed OpenAI's ChatGPT as a zero-shot learning model to produce human-like annotations for sentiment tasks. The feature vector from ChatGPT was input into BERT, BiLSTM, and a SoftMax function to measure and categorize sentiment. Oversampling methods addressed data imbalance, and visualization techniques were applied to review text and polarity. Our method performed as well as or better than recent cutting-edge methods, achieving an average score of 98.9%, an F1-measure of 97.7%, and an AUC of 91.90% with oversampling. Traditional LBMs, SVMs, and logistic regression achieved 86.68% accuracy and an AUC of 91.90%. The study demonstrates ChatGPT’s competence in annotating customer reviews with emotional tone or polarity, highlighting the benefits of integrating customer SA with financial analysis to prioritize customer preferences. To overcome LBMs' limitations and pre-defined sentiment lexicons, we developed LFEAR, which combines the RAG model with a conversational format for an ARFT. Fine-tuned on HelloPeter reviews, LFEAR demonstrated resilience and flexibility in analyzing sentiments across various domains. It achieved an average answer precision score of 98.45%, correctness of 93.85%, and context precision of 97.69% according to RAGAS metrics. The LFEAR model effectively conducted SA over multiple domains, demonstrating adaptability, proper sentiment annotation, and bias-free analysis. This approach is particularly beneficial for social media posts by financial sector stakeholders, including investors and institutions whose posts impact JSE-listed entities.en_US
dc.description.availabilityUnrestricteden_US
dc.description.degreeMsc (Computer Science)en_US
dc.description.departmentComputer Scienceen_US
dc.description.facultyFaculty of Engineering, Built Environment and Information Technologyen_US
dc.description.sdgNoneen_US
dc.identifier.citation*en_US
dc.identifier.doihttps://doi.org/10.25403/UPresearchdata.28504796en_US
dc.identifier.otherA2025en_US
dc.identifier.urihttp://hdl.handle.net/2263/101248
dc.language.isoenen_US
dc.publisherUniversity of Pretoria
dc.rights© 2023 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
dc.subjectUCTDen_US
dc.subjectSustainable Development Goals (SDGs)en_US
dc.subjectLarge language modelsen_US
dc.subjectSentiment analysisen_US
dc.subjectRetrieval-augmented generationen_US
dc.subjectPrompt engineeringen_US
dc.subjectConversational fine-tuningen_US
dc.subjectRetrieval augmented generation assessmenten_US
dc.subjectAuto-regressive LLMen_US
dc.titleFrom text annotation to an auto-regressive language model for sentiment analysis in South African financial reviewsen_US
dc.typeDissertationen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Mathebula_From_2024.pdf
Size:
11.97 MB
Format:
Adobe Portable Document Format
Description:
Dissertation

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: