Learning industrial descriptions : NLP tasks for acronym expansion

Show simple item record

dc.contributor.advisor Marivate, Vukosi
dc.contributor.postgraduate Johnson, Shaun
dc.date.accessioned 2024-09-13T11:59:28Z
dc.date.available 2024-09-13T11:59:28Z
dc.date.created 2024-04
dc.date.issued 2024-02
dc.description Mini Dissertation (MIT (Big Data Science))--University of Pretoria, 2024. en_US
dc.description.abstract The human language is cryptic since words can be interpreted differently based upon the context within which they occur. The exact meaning of a particular word in its context might be trivial for humans who are generally unaware of language ambiguities. Machines, on the other hand, are required to process, transform and analyse unstructured textual information to determine the underlying meaning. “Acronyms” are shorter versions of phrases and are advantageous to save time and space for both handwritten and typed out “expansions or meanings”. The main disadvantage caused by acronyms is confusion; if misunderstood they can unknowingly cause damage, have a negative effect, or abuse the receiver. Acronyms in one context might not be appropriate for a audience in another context for the same acronym. Solving acronym disambiguation could help reduce the negative effects of using acronyms. In this project we apply NLP technologies for a case study at a particular organisation in the Mining, Metals & Minerals ( MMM) sector. The MMM organisation plant sensors’ tags (the acronyms) are derived by domain experts from technical programmable logic controller ( PLC) names into pseudo English (metallurgical) descriptions, these being the ground truth expansions, to describe the sensors adequately for multiple stakeholders (including non-domain experts). There is varied human input, leading to inconsistency in initiating “tag names (acronyms)”, and this leads to uncertainty of various degrees in trying to derive an “accurate description from the tags (acronym expansions)”. The aim of this research is to gauge to what extent transfer learning can be applied between similar domains using large language models. For example, Scientific document understanding could possibly explain some Mining, Metals & Minerals acronyms. This leads us to the research question, can NLP pre-trained transformers be applied to the MMM industry for which there are low resource settings and little (or no) acronym dictionaries? We presented a SciAD/ SDU fine-tuned transformers that can disambiguate acronyms within Scientific document understanding ( SDU) context very well and is a stepping stone to being used in the Mining, Metals & Minerals ( MMM) domain in future. We foresee that there is still opportunity to unlock the benefits of other pre-trained language models ( PLM). We note the value that a small model could be used for the MMM domain. en_US
dc.description.availability Unrestricted en_US
dc.description.degree MIT (Big Data Science) en_US
dc.description.department Computer Science en_US
dc.description.faculty Faculty of Engineering, Built Environment and Information Technology en_US
dc.identifier.citation * en_US
dc.identifier.other A2024 en_US
dc.identifier.uri http://hdl.handle.net/2263/98197
dc.language.iso en en_US
dc.publisher University of Pretoria
dc.rights © 2021 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
dc.subject UCTD en_US
dc.subject Acronym expansion en_US
dc.subject Industrial descriptions en_US
dc.subject Natural Language Processing (NLP) en_US
dc.title Learning industrial descriptions : NLP tasks for acronym expansion en_US
dc.type Mini Dissertation en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record