dc.contributor.advisor |
Marivate, Vukosi |
|
dc.contributor.postgraduate |
Johnson, Shaun |
|
dc.date.accessioned |
2024-09-13T11:59:28Z |
|
dc.date.available |
2024-09-13T11:59:28Z |
|
dc.date.created |
2024-04 |
|
dc.date.issued |
2024-02 |
|
dc.description |
Mini Dissertation (MIT (Big Data Science))--University of Pretoria, 2024. |
en_US |
dc.description.abstract |
The human language is cryptic since words can be interpreted differently based upon the context within which they occur. The exact meaning of a particular word in its context might be
trivial for humans who are generally unaware of language ambiguities. Machines, on the other
hand, are required to process, transform and analyse unstructured textual information to determine the underlying meaning.
“Acronyms” are shorter versions of phrases and are advantageous to save time and space for
both handwritten and typed out “expansions or meanings”. The main disadvantage caused by
acronyms is confusion; if misunderstood they can unknowingly cause damage, have a negative
effect, or abuse the receiver. Acronyms in one context might not be appropriate for a audience
in another context for the same acronym. Solving acronym disambiguation could help reduce
the negative effects of using acronyms.
In this project we apply NLP technologies for a case study at a particular organisation in the
Mining, Metals & Minerals ( MMM) sector. The MMM organisation plant sensors’ tags (the
acronyms) are derived by domain experts from technical programmable logic controller ( PLC)
names into pseudo English (metallurgical) descriptions, these being the ground truth expansions,
to describe the sensors adequately for multiple stakeholders (including non-domain experts).
There is varied human input, leading to inconsistency in initiating “tag names (acronyms)”, and
this leads to uncertainty of various degrees in trying to derive an “accurate description from the
tags (acronym expansions)”.
The aim of this research is to gauge to what extent transfer learning can be applied between
similar domains using large language models. For example, Scientific document understanding
could possibly explain some Mining, Metals & Minerals acronyms.
This leads us to the research question, can NLP pre-trained transformers be applied to the MMM
industry for which there are low resource settings and little (or no) acronym dictionaries?
We presented a SciAD/ SDU fine-tuned transformers that can disambiguate acronyms within
Scientific document understanding ( SDU) context very well and is a stepping stone to being
used in the Mining, Metals & Minerals ( MMM) domain in future. We foresee that there is still
opportunity to unlock the benefits of other pre-trained language models ( PLM). We note the
value that a small model could be used for the MMM domain. |
en_US |
dc.description.availability |
Unrestricted |
en_US |
dc.description.degree |
MIT (Big Data Science) |
en_US |
dc.description.department |
Computer Science |
en_US |
dc.description.faculty |
Faculty of Engineering, Built Environment and Information Technology |
en_US |
dc.identifier.citation |
* |
en_US |
dc.identifier.other |
A2024 |
en_US |
dc.identifier.uri |
http://hdl.handle.net/2263/98197 |
|
dc.language.iso |
en |
en_US |
dc.publisher |
University of Pretoria |
|
dc.rights |
© 2021 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria. |
|
dc.subject |
UCTD |
en_US |
dc.subject |
Acronym expansion |
en_US |
dc.subject |
Industrial descriptions |
en_US |
dc.subject |
Natural Language Processing (NLP) |
en_US |
dc.title |
Learning industrial descriptions : NLP tasks for acronym expansion |
en_US |
dc.type |
Mini Dissertation |
en_US |