Learning industrial descriptions : NLP tasks for acronym expansion

Johnson, Shaun

UPSpace Home
→
University of Pretoria: Research Output
→
Theses and Dissertations (University of Pretoria)
→
View Item

dc.contributor.advisor	Marivate, Vukosi
dc.contributor.postgraduate	Johnson, Shaun
dc.date.accessioned	2024-09-13T11:59:28Z
dc.date.available	2024-09-13T11:59:28Z
dc.date.created	2024-04
dc.date.issued	2024-02
dc.description	Mini Dissertation (MIT (Big Data Science))--University of Pretoria, 2024.	en_US
dc.description.abstract	The human language is cryptic since words can be interpreted differently based upon the context within which they occur. The exact meaning of a particular word in its context might be trivial for humans who are generally unaware of language ambiguities. Machines, on the other hand, are required to process, transform and analyse unstructured textual information to determine the underlying meaning. “Acronyms” are shorter versions of phrases and are advantageous to save time and space for both handwritten and typed out “expansions or meanings”. The main disadvantage caused by acronyms is confusion; if misunderstood they can unknowingly cause damage, have a negative effect, or abuse the receiver. Acronyms in one context might not be appropriate for a audience in another context for the same acronym. Solving acronym disambiguation could help reduce the negative effects of using acronyms. In this project we apply NLP technologies for a case study at a particular organisation in the Mining, Metals & Minerals ( MMM) sector. The MMM organisation plant sensors’ tags (the acronyms) are derived by domain experts from technical programmable logic controller ( PLC) names into pseudo English (metallurgical) descriptions, these being the ground truth expansions, to describe the sensors adequately for multiple stakeholders (including non-domain experts). There is varied human input, leading to inconsistency in initiating “tag names (acronyms)”, and this leads to uncertainty of various degrees in trying to derive an “accurate description from the tags (acronym expansions)”. The aim of this research is to gauge to what extent transfer learning can be applied between similar domains using large language models. For example, Scientific document understanding could possibly explain some Mining, Metals & Minerals acronyms. This leads us to the research question, can NLP pre-trained transformers be applied to the MMM industry for which there are low resource settings and little (or no) acronym dictionaries? We presented a SciAD/ SDU fine-tuned transformers that can disambiguate acronyms within Scientific document understanding ( SDU) context very well and is a stepping stone to being used in the Mining, Metals & Minerals ( MMM) domain in future. We foresee that there is still opportunity to unlock the benefits of other pre-trained language models ( PLM). We note the value that a small model could be used for the MMM domain.	en_US
dc.description.availability	Unrestricted	en_US
dc.description.degree	MIT (Big Data Science)	en_US
dc.description.department	Computer Science	en_US
dc.description.faculty	Faculty of Engineering, Built Environment and Information Technology	en_US
dc.identifier.citation	*	en_US
dc.identifier.other	A2024	en_US
dc.identifier.uri	http://hdl.handle.net/2263/98197
dc.language.iso	en	en_US
dc.publisher	University of Pretoria
dc.rights	© 2021 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
dc.subject	UCTD	en_US
dc.subject	Acronym expansion	en_US
dc.subject	Industrial descriptions	en_US
dc.subject	Natural Language Processing (NLP)	en_US
dc.title	Learning industrial descriptions : NLP tasks for acronym expansion	en_US
dc.type	Mini Dissertation	en_US