dc.contributor.author |
Mathebula, Miehleketo
|
|
dc.contributor.author |
Modupe, Abiodun
|
|
dc.contributor.author |
Marivate, Vukosi
|
|
dc.date.accessioned |
2025-01-27T06:27:49Z |
|
dc.date.available |
2025-01-27T06:27:49Z |
|
dc.date.issued |
2024-12 |
|
dc.description |
DATA AVAILABITY STATEMENT: The original contributions presented in the study are included in the
article, further inquiries can be directed to the corresponding author. |
en_US |
dc.description |
This article forms part of a special collection titled 'Applications of Data Science and Artificial Intelligence'. |
en_US |
dc.description.abstract |
Sentiment analysis is a well-known task that has been used to analyse customer feedback
reviews and media headlines to detect the sentimental personality or polarisation of a given text.
With the growth of social media and other online platforms, like Twitter (now branded as X),
Facebook, blogs, and others, it has been used in the investment community to monitor customer
feedback, reviews, and news headlines about financial institutions’ products and services to ensure
business success and prioritise aspects of customer relationship management. Supervised learning
algorithms have been popularly employed for this task, but the performance of these models has
been compromised due to the brevity of the content and the presence of idiomatic expressions, sound
imitations, and abbreviations. Additionally, the pre-training of a larger language model (PTLM)
struggles to capture bidirectional contextual knowledge learnt through word dependency because the
sentence-level representation fails to take broad features into account. We develop a novel structure
called language feature extraction and adaptation for reviews (LFEAR), an advanced natural language
model that amalgamates retrieval-augmented generation (RAG) with a conversation format for an
auto-regressive fine-tuning model (ARFT). This helps to overcome the limitations of lexicon-based
tools and the reliance on pre-defined sentiment lexicons, which may not fully capture the range
of sentiments in natural language and address questions on various topics and tasks. LFEAR is
fine-tuned on Hellopeter reviews that incorporate industry-specific contextual information retrieval
to show resilience and flexibility for various tasks, including analysing sentiments in reviews of
restaurants, movies, politics, and financial products. The proposed model achieved an average
precision score of 98.45%, answer correctness of 93.85%, and context precision of 97.69% based on
Retrieval-Augmented Generation Assessment (RAGAS) metrics. The LFEAR model is effective in
conducting sentiment analysis across various domains due to its adaptability and scalable inference
mechanism. It considers unique language characteristics and patterns in specific domains to ensure
accurate sentiment annotation. This is particularly beneficial for individuals in the financial sector,
such as investors and institutions, including those listed on the Johannesburg Stock Exchange (JSE),
which is the primary stock exchange in South Africa and plays a significant role in the country’s
financial market. Future initiatives will focus on incorporating a wider range of data sources and
improving the system’s ability to express nuanced sentiments effectively, enhancing its usefulness in
diverse real-world scenarios. |
en_US |
dc.description.department |
Computer Science |
en_US |
dc.description.sdg |
SDG-09: Industry, innovation and infrastructure |
en_US |
dc.description.sdg |
SDG-12:Responsible consumption and production |
en_US |
dc.description.sponsorship |
The funded Data Science for Social Impact (DSFI) Group at
the University of Pretoria from Google. |
en_US |
dc.description.uri |
https://www.mdpi.com/journal/applsci |
en_US |
dc.identifier.citation |
Mathebula, M.; Modupe, A.;
Marivate, V. Fine-Tuning RetrievalAugmented Generation with an
Auto-Regressive Language Model for
Sentiment Analysis in Financial
Reviews. Applied Sciences (Switzerland) 2024, 14, 10782.
https://doi.org/10.3390/app142310782. |
en_US |
dc.identifier.issn |
2076-3417 (online) |
|
dc.identifier.other |
10.3390/app142310782 |
|
dc.identifier.uri |
http://hdl.handle.net/2263/100302 |
|
dc.language.iso |
en |
en_US |
dc.publisher |
MDPI |
en_US |
dc.rights |
© 2024 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an Open Access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/). |
en_US |
dc.subject |
Sentiment analysis |
en_US |
dc.subject |
Prompt engineering |
en_US |
dc.subject |
Conversational fine-tuning |
en_US |
dc.subject |
Retrieval augmented generation assessment |
en_US |
dc.subject |
SDG-09: Industry, innovation and infrastructure |
en_US |
dc.subject |
SDG-12: Responsible consumption and production |
en_US |
dc.subject |
Language feature extraction and adaptation for reviews (LFEAR) |
en_US |
dc.subject |
Pre-training of a larger language model (PTLM) |
en_US |
dc.subject |
Retrieval-augmented generation (RAG) |
en_US |
dc.subject |
Auto-regressive fine-tuning model (ARFT) |
en_US |
dc.subject |
Large language model (LLM) |
en_US |
dc.title |
Fine-tuning retrieval-augmented generation with an auto-regressive language model for sentiment analysis in financial reviews |
en_US |
dc.type |
Article |
en_US |