Developing an XML-based, exploitable linguistic database of the Hebrew text of Gen. 1:1-2:3

Show simple item record

dc.contributor.advisor Bothma, T.J.D. (Theodorus Jan Daniel) en
dc.contributor.coadvisor Matthee, Machdel C. en
dc.contributor.postgraduate Kroeze, J.H. (Jan Hendrik) en
dc.date.accessioned 2013-09-07T07:36:38Z
dc.date.available 2008-09-08 en
dc.date.available 2013-09-07T07:36:38Z
dc.date.created 2008-09-02 en
dc.date.issued 2008-09-08 en
dc.date.submitted 2008-07-28 en
dc.description Thesis (PhD (Information Technology))--University of Pretoria, 2008. en
dc.description.abstract The thesis discusses a series of related techniques that prepare and transform raw linguistic data for advanced processing in order to unveil hidden grammatical patterns. A threedimensional array is identified as a suitable data structure to build a data cube to capture multidimensional linguistic data in a computer's temporary storage facility. It also enables online analytical processing, like slicing, to be executed on this data cube in order to reveal various subsets and presentations of the data. XML is investigated as a suitable mark-up language to permanently store such an exploitable databank of Biblical Hebrew linguistic data. This concept is illustrated by tagging a phonetic transcription of Genesis 1:1-2:3 on various linguistic levels and manipulating this databank. Transferring the data set between an XML file and a threedimensional array creates a stable environment allowing editing and advanced processing of the data in order to confirm existing knowledge or to mine for new, yet undiscovered, linguistic features. Two experiments are executed to demonstrate possible text-mining procedures. Finally, visualisation is discussed as a technique that enhances interaction between the human researcher and the computerised technologies supporting the process of knowledge creation. Although the data set is very small there are exciting indications that the compilation and analysis of aggregate linguistic data may assist linguists to perform rigorous research, for example regarding the definitions of semantic functions and the mapping of these functions onto the syntactic module. en
dc.description.availability unrestricted en
dc.description.department Information Science en
dc.identifier.citation 2008 en
dc.identifier.other B23/eo en
dc.identifier.upetdurl http://upetd.up.ac.za/thesis/available/etd-07282008-121520/ en
dc.identifier.uri http://hdl.handle.net/2263/26750
dc.language.iso en
dc.publisher University of Pretoria en_ZA
dc.rights ©University of Pretoria 2008 B23/ en
dc.subject Online analytical processing (olap) en
dc.subject Xml en
dc.subject Hebrew bible en
dc.subject Threedimensional array en
dc.subject Visualisation en
dc.subject Computational linguistics en
dc.subject Text data mining en
dc.subject Data warehousing en
dc.subject Database management en
dc.subject Round-tripping en
dc.subject UCTD en_US
dc.title Developing an XML-based, exploitable linguistic database of the Hebrew text of Gen. 1:1-2:3 en
dc.type Thesis en


Files in this item

This item appears in the following Collection(s)

Show simple item record