A transformer-based approach to Nigerian Pidgin text generation

Show simple item record

dc.contributor.author Garba, Kabir
dc.contributor.author Kolajo, Taiwo
dc.contributor.author Agbogun, Joshua B.
dc.date.accessioned 2025-03-14T10:34:54Z
dc.date.available 2025-03-14T10:34:54Z
dc.date.issued 2024-12
dc.description.abstract This paper describes the development of a transformer-based text generation model for Nigerian Pidgin also known as Naijá, a popular language in West Africa. Despite its wide use, Nigerian Pidgin remains under-resourced, particularly in areas related to text generation and natural language processing. These difficulties are primarily due to technological constraints rather than the language’s fundamental attributes. There is currently a demand for Nigerian Pidgin-specific solutions because it is used in everyday communication and has a unique linguistic blend. This paper aims to close this gap by exploring the application of state-of-the-art transformer technology to develop a text generation model for Nigerian Pidgin. This work uses the public Afriberta-corpus dataset to optimize the Generative Pre-trained Transformer (GPT-2) model across a sizeable dataset. The performance evaluators, BLEU and Perplexity metrics provide a detailed breakdown of the model’s text quality and predictive accuracy. Despite the difficulties caused by a limited amount of training data, preliminary evaluations show that the model can generate coherent Nigerian Pidgin text. The performance evaluation yielded perplexity scores of 43.56 for variable target reference length and 43.26 for fixed text length. BLEU scores of 0.15 for fixed max length and 0.56 for variable reference target length. This highlights the quality of generated text and the significant improvement when the generated text length is aligned with the reference target. Our work was benchmarked against African American Vernacular (AAVE) revealing that BLEU scores for AAVE are significantly lower than those for Standard American English, with BLEU given as 0.26. Our Nigerian Pidgin model, with a BLEU score of 0.56, shows a better performance. However, both results suggest that both dialects are challenging for language models. Leveraging the pre-trained transformer-based language model and evaluation metrics, we showcase the model’s capacity for coherent Nigerian Pidgin text generation. For future research, the research work can serve as a good foundation for advancement and progress in the Nigerian Pidgin language generation and other low-resource languages. en_US
dc.description.department Informatics en_US
dc.description.librarian am2024 en_US
dc.description.sdg SDG-09: Industry, innovation and infrastructure en_US
dc.description.sponsorship Open access funding provided by University of Pretoria. en_US
dc.description.uri http://link.springer.com/journal/10772 en_US
dc.identifier.citation Garba, K., Kolajo, T., Agbogun, J.B. et al. 2024, 'A transformer-based approach to Nigerian Pidgin text generation', International Journal of Speech Technology, vol. 27, pp. 1027-1037. https://DOI.org/10.1007/s10772-024-10136-2. en_US
dc.identifier.issn 1381-2416 (print)
dc.identifier.issn 1572-8110 (online)
dc.identifier.other 10.1007/s10772-024-10136-2
dc.identifier.uri http://hdl.handle.net/2263/101508
dc.language.iso en en_US
dc.publisher Springer en_US
dc.rights © The Author(s) 2024. Open access. This article is licensed under a Creative Commons Attribution 4.0 International License. en_US
dc.subject Transformers en_US
dc.subject Nigerian pidgin en_US
dc.subject Controllable text generation en_US
dc.subject Natural language generation en_US
dc.subject Pre-trained language models en_US
dc.subject Natural language processing (NLP) en_US
dc.subject SDG-09: Industry, innovation and infrastructure en_US
dc.title A transformer-based approach to Nigerian Pidgin text generation en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record