The capability of search tools to retrieve words with specific properties from large text collections

Loading...
Thumbnail Image

Authors

Ball, L.H. (Liezl Hilde)
Bothma, T.J.D. (Theodorus Jan Daniel)

Journal Title

Journal ISSN

Volume Title

Publisher

University of Borås

Abstract

INTRODUCTION: With the increase in the availability of digital text collections for humanities researchers, tools to enable enhanced retrieval are required. If words with very specific properties could be retrieved from a text collection more accurate linguistic and other analyses can be made. There are a range of properties and metadata that could be specified for retrieval, from morphological data up to bibliographic data. Furthermore, the bibliographic data should not only be on item level but extended to the text-level. For example, in an anthology each section could be encoded with the author of that section. Such extended metadata will enable fine-grained retrieval. METHOD: In this study, current tools were evaluated to determine to what extent they allow users to retrieve words with specific properties from a text collection. ANALYSIS: The analysis is limited to the following criteria: interface design, metadata, search options, filtering and search results. RESULTS: Currently, it is not possible for a user to retrieve words with specific properties from a text collection. CONCLUSION: An extended set of metadata should be used to encode text to enable retrieval of words on a fine-grained level.

Description

Keywords

Digital text, Research, Search tools, Information retrieval

Sustainable Development Goals

Citation

Ball, L., & Bothma, T. (2020). The capability of search tools to retrieve words with specific properties from large text collections. In Proceedings of ISIC, the Information Behaviour Conference, Pretoria, South Africa, 28 September - 1 October, 2020. Information Research, 25(4), paper isic2030. Retrieved from http://InformationR.net/ir/25-4/isic2020/isic2030.html (Archived by the Internet Archive at https://bit.ly/3meU2cA) https://doi.org/10.47989/irisic2030