Enhancing digital text collections with detailed metadata to improve retrieval

Show simple item record

dc.contributor.advisor Bothma, T.J.D. (Theodorus Jan Daniel)
dc.contributor.postgraduate Ball, L.H. (Liezl Hilde)
dc.date.accessioned 2021-03-17T07:04:07Z
dc.date.available 2021-03-17T07:04:07Z
dc.date.created 2021-04-20
dc.date.issued 2020
dc.description Thesis (DPhil (Information Science))--University of Pretoria, 2020. en_ZA
dc.description.abstract Digital text collections are increasingly important, as they enable researchers to explore new ways of interacting with texts through the use of technology. Various tools have been developed to facilitate exploring and searching in text collections at a fairly low level of granularity. Ideally, it should be possible to filter the results at a greater level of granularity to retrieve only specific instances in which the researcher is interested. The aim of this study was to investigate to what extent detailed metadata could be used to enhance texts in order to improve retrieval. To do this, the researcher had to identify metadata that could be useful to filter according to and find ways in which these metadata can be applied to or encoded in texts. The researcher also had to evaluate existing tools to determine to what extent current tools support retrieval on a fine-grained level. After identifying useful metadata and reviewing existing tools, the researcher could suggest a metadata framework that could be used to encode texts on a detailed level. Metadata in five different categories were used, namely morphological, syntactic, semantic, functional and bibliographic. A further contribution in this metadata framework was the addition of in-text bibliographic metadata, to use where sections in a text have different properties than those in the main text. The suggested framework had to be tested to determine if retrieval was indeed improved. In order to do so, a selection of texts was encoded with the suggested framework and a prototype was developed to test the retrieval. The prototype receives the encoded texts and stores the information in a database. A graphical user interface was developed to enable searching in the database in an easy and intuitive manner. The prototype demonstrates that it is possible to search for words or phrases with specific properties when detailed metadata are applied to texts. The fine-grained metadata from five different categories enable retrieval on a greater level of granularity and specificity. It is therefore recommended that detailed metadata are used to encode texts in order to improve retrieval in digital text collections. Keywords: metadata, digital humanities, digital text collections, retrieval, encoding en_ZA
dc.description.availability Unrestricted en_ZA
dc.description.degree DPhil (Information Science) en_ZA
dc.description.department Information Science en_ZA
dc.identifier.citation Ball, LH 2020, Enhancing digital text collections with detailed metadata to improve retrieval, DPhil (Information Science) Thesis, University of Pretoria, Pretoria, viewed yymmdd <http://hdl.handle.net/2263/79015> en_ZA
dc.identifier.other A2021 en_ZA
dc.identifier.uri http://hdl.handle.net/2263/79015
dc.language.iso en en_ZA
dc.publisher University of Pretoria
dc.rights © 2019 University of Pretoria. All rights reserved. The copyright in this work vests in the University of Pretoria. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of the University of Pretoria.
dc.subject UCTD en_ZA
dc.subject Information science en_ZA
dc.subject Metadata
dc.subject Digital humanities
dc.subject Retrieval
dc.subject Encoding
dc.subject Digital text collections
dc.subject.other Engineering, built environment and information technology theses SDG-04
dc.subject.other SDG-04: Quality education
dc.subject.other Engineering, built environment and information technology theses SDG-09
dc.subject.other SDG-09: Industry, innovation and infrastructure
dc.subject.other Engineering, built environment and information technology theses SDG-16
dc.subject.other SDG-16: Peace, justice and strong institutions
dc.title Enhancing digital text collections with detailed metadata to improve retrieval en_ZA
dc.type Thesis en_ZA


Files in this item

This item appears in the following Collection(s)

Show simple item record