Fine-tuning retrieval-augmented generation with an auto-regressive language model for sentiment analysis in financial reviews

Show simple item record

dc.contributor.author Mathebula, Miehleketo
dc.contributor.author Modupe, Abiodun
dc.contributor.author Marivate, Vukosi
dc.date.accessioned 2025-01-27T06:27:49Z
dc.date.available 2025-01-27T06:27:49Z
dc.date.issued 2024-12
dc.description DATA AVAILABITY STATEMENT: The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author. en_US
dc.description This article forms part of a special collection titled 'Applications of Data Science and Artificial Intelligence'. en_US
dc.description.abstract Sentiment analysis is a well-known task that has been used to analyse customer feedback reviews and media headlines to detect the sentimental personality or polarisation of a given text. With the growth of social media and other online platforms, like Twitter (now branded as X), Facebook, blogs, and others, it has been used in the investment community to monitor customer feedback, reviews, and news headlines about financial institutions’ products and services to ensure business success and prioritise aspects of customer relationship management. Supervised learning algorithms have been popularly employed for this task, but the performance of these models has been compromised due to the brevity of the content and the presence of idiomatic expressions, sound imitations, and abbreviations. Additionally, the pre-training of a larger language model (PTLM) struggles to capture bidirectional contextual knowledge learnt through word dependency because the sentence-level representation fails to take broad features into account. We develop a novel structure called language feature extraction and adaptation for reviews (LFEAR), an advanced natural language model that amalgamates retrieval-augmented generation (RAG) with a conversation format for an auto-regressive fine-tuning model (ARFT). This helps to overcome the limitations of lexicon-based tools and the reliance on pre-defined sentiment lexicons, which may not fully capture the range of sentiments in natural language and address questions on various topics and tasks. LFEAR is fine-tuned on Hellopeter reviews that incorporate industry-specific contextual information retrieval to show resilience and flexibility for various tasks, including analysing sentiments in reviews of restaurants, movies, politics, and financial products. The proposed model achieved an average precision score of 98.45%, answer correctness of 93.85%, and context precision of 97.69% based on Retrieval-Augmented Generation Assessment (RAGAS) metrics. The LFEAR model is effective in conducting sentiment analysis across various domains due to its adaptability and scalable inference mechanism. It considers unique language characteristics and patterns in specific domains to ensure accurate sentiment annotation. This is particularly beneficial for individuals in the financial sector, such as investors and institutions, including those listed on the Johannesburg Stock Exchange (JSE), which is the primary stock exchange in South Africa and plays a significant role in the country’s financial market. Future initiatives will focus on incorporating a wider range of data sources and improving the system’s ability to express nuanced sentiments effectively, enhancing its usefulness in diverse real-world scenarios. en_US
dc.description.department Computer Science en_US
dc.description.sdg SDG-09: Industry, innovation and infrastructure en_US
dc.description.sdg SDG-12:Responsible consumption and production en_US
dc.description.sponsorship The funded Data Science for Social Impact (DSFI) Group at the University of Pretoria from Google. en_US
dc.description.uri https://www.mdpi.com/journal/applsci en_US
dc.identifier.citation Mathebula, M.; Modupe, A.; Marivate, V. Fine-Tuning RetrievalAugmented Generation with an Auto-Regressive Language Model for Sentiment Analysis in Financial Reviews. Applied Sciences (Switzerland) 2024, 14, 10782. https://doi.org/10.3390/app142310782. en_US
dc.identifier.issn 2076-3417 (online)
dc.identifier.other 10.3390/app142310782
dc.identifier.uri http://hdl.handle.net/2263/100302
dc.language.iso en en_US
dc.publisher MDPI en_US
dc.rights © 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an Open Access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). en_US
dc.subject Sentiment analysis en_US
dc.subject Prompt engineering en_US
dc.subject Conversational fine-tuning en_US
dc.subject Retrieval augmented generation assessment en_US
dc.subject SDG-09: Industry, innovation and infrastructure en_US
dc.subject SDG-12: Responsible consumption and production en_US
dc.subject Language feature extraction and adaptation for reviews (LFEAR) en_US
dc.subject Pre-training of a larger language model (PTLM) en_US
dc.subject Retrieval-augmented generation (RAG) en_US
dc.subject Auto-regressive fine-tuning model (ARFT) en_US
dc.subject Large language model (LLM) en_US
dc.title Fine-tuning retrieval-augmented generation with an auto-regressive language model for sentiment analysis in financial reviews en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record