Latent semantic models : a study of probabilistic models for text in information retrieval

Latent semantic models : a study of probabilistic models for text in information retrieval

Files

Mjali_Latent_2020.pdf (4.33 MB)

Date

2020

Authors

Mjali, Siyabonga Zimozoxolo

Publisher

University of Pretoria

Abstract

Large volumes of text is being generated every minute which necessitates effective and robust tools to retrieve relevant information. Supervised learning approaches have been explored extensively for this task, but it is difficult to secure large collections of labelled data to train this set of models. Since a supervised approach is too expensive in terms of annotating data, we consider unsupervised methods such as topic models and word embeddings in order to represent corpora in lower dimensional semantic spaces. Furthermore, we investigate different distance measures to capture similarity between indexed documents based on their semantic distributions. These include cosine, soft cosine and Jensen-Shannon similarities. This collection of methods discussed in this work allows for the unsupervised association of semantic similar texts which has a wide range of applications such as fake news detection, sociolinguistics and sentiment analysis.

Description

Mini Dissertation (MSc)--University of Pretoria, 2020.

Keywords

UCTD

Citation

Mjali, SZ 2020, Latent semantic models: A study of probabilistic models for text in information retrieval, Masters mini dissertation, University of Pretoria, Pretoria

URI

http://hdl.handle.net/2263/73881

Collections

Theses and Dissertations (University of Pretoria)
Theses and Dissertations (Statistics)

Full item page

Latent semantic models : a study of probabilistic models for text in information retrieval

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Sustainable Development Goals

Citation

URI

Collections