Latent semantic models : a study of probabilistic models for text in information retrieval

dc.contributor.advisorDe Waal, Alta
dc.contributor.emailsiyabongamjali@gmail.comen_ZA
dc.contributor.postgraduateMjali, Siyabonga Zimozoxolo
dc.date.accessioned2020-03-31T07:21:02Z
dc.date.available2020-03-31T07:21:02Z
dc.date.created2020-09
dc.date.issued2020
dc.descriptionMini Dissertation (MSc)--University of Pretoria, 2020.en_ZA
dc.description.abstractLarge volumes of text is being generated every minute which necessitates effective and robust tools to retrieve relevant information. Supervised learning approaches have been explored extensively for this task, but it is difficult to secure large collections of labelled data to train this set of models. Since a supervised approach is too expensive in terms of annotating data, we consider unsupervised methods such as topic models and word embeddings in order to represent corpora in lower dimensional semantic spaces. Furthermore, we investigate different distance measures to capture similarity between indexed documents based on their semantic distributions. These include cosine, soft cosine and Jensen-Shannon similarities. This collection of methods discussed in this work allows for the unsupervised association of semantic similar texts which has a wide range of applications such as fake news detection, sociolinguistics and sentiment analysis.en_ZA
dc.description.availabilityUnrestricteden_ZA
dc.description.degreeMSc (Mathematical Statistics)en_ZA
dc.description.departmentStatisticsen_ZA
dc.description.sponsorshipThe Hub Internshipen_ZA
dc.description.sponsorshipCentre for Artificial Intelligence Researchen_ZA
dc.identifier.citationMjali, SZ 2020, Latent semantic models: A study of probabilistic models for text in information retrieval, Masters mini dissertation, University of Pretoria, Pretoriaen_ZA
dc.identifier.otherS2020en_ZA
dc.identifier.urihttp://hdl.handle.net/2263/73881
dc.language.isoenen_ZA
dc.publisherUniversity of Pretoria
dc.rights© 2019 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
dc.subjectUCTD
dc.titleLatent semantic models : a study of probabilistic models for text in information retrievalen_ZA
dc.typeMini Dissertationen_ZA

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Mjali_Latent_2020.pdf
Size:
4.33 MB
Format:
Adobe Portable Document Format
Description:
Mini Dissertation

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.75 KB
Format:
Item-specific license agreed upon to submission
Description: