Developing an XML-based, exploitable linguistic database of the Hebrew text of Gen. 1:1-2:3

Developing an XML-based, exploitable linguistic database of the Hebrew text of Gen. 1:1-2:3

Files

00front.pdf (71.54 KB)

01chapter1.pdf (200.45 KB)

02chapter2.pdf (733.64 KB)

03chapter3.pdf (889.23 KB)

04chapter4.pdf (260.95 KB)

Date

2008-09-08

Authors

Kroeze, J.H. (Jan Hendrik)

Publisher

University of Pretoria

Abstract

The thesis discusses a series of related techniques that prepare and transform raw linguistic data for advanced processing in order to unveil hidden grammatical patterns. A threedimensional array is identified as a suitable data structure to build a data cube to capture multidimensional linguistic data in a computer's temporary storage facility. It also enables online analytical processing, like slicing, to be executed on this data cube in order to reveal various subsets and presentations of the data. XML is investigated as a suitable mark-up language to permanently store such an exploitable databank of Biblical Hebrew linguistic data. This concept is illustrated by tagging a phonetic transcription of Genesis 1:1-2:3 on various linguistic levels and manipulating this databank. Transferring the data set between an XML file and a threedimensional array creates a stable environment allowing editing and advanced processing of the data in order to confirm existing knowledge or to mine for new, yet undiscovered, linguistic features. Two experiments are executed to demonstrate possible text-mining procedures. Finally, visualisation is discussed as a technique that enhances interaction between the human researcher and the computerised technologies supporting the process of knowledge creation. Although the data set is very small there are exciting indications that the compilation and analysis of aggregate linguistic data may assist linguists to perform rigorous research, for example regarding the definitions of semantic functions and the mapping of these functions onto the syntactic module.

Description

Thesis (PhD (Information Technology))--University of Pretoria, 2008.

Keywords

Online analytical processing (olap), Xml, Hebrew bible, Threedimensional array, Visualisation, Computational linguistics, Text data mining, Data warehousing, Database management, Round-tripping, UCTD

Citation

2008

URI

http://hdl.handle.net/2263/26750

Collections

Theses and Dissertations (University of Pretoria)
Theses and Dissertations (Information Science)

Full item page

Developing an XML-based, exploitable linguistic database of the Hebrew text of Gen. 1:1-2:3

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Sustainable Development Goals

Citation

URI

Collections