Developing an XML-based, exploitable linguistic database of the Hebrew text of Gen. 1:1-2:3

Developing an XML-based, exploitable linguistic database of the Hebrew text of Gen. 1:1-2:3

dc.contributor.advisor	Bothma, T.J.D. (Theodorus Jan Daniel)	en
dc.contributor.coadvisor	Matthee, Machdel C.	en
dc.contributor.email	jan.kroeze@gmail.com	en
dc.contributor.postgraduate	Kroeze, J.H. (Jan Hendrik)	en
dc.date.accessioned	2013-09-07T07:36:38Z
dc.date.available	2008-09-08	en
dc.date.available	2013-09-07T07:36:38Z
dc.date.created	2008-09-02	en
dc.date.issued	2008-09-08	en
dc.date.submitted	2008-07-28	en
dc.description	Thesis (PhD (Information Technology))--University of Pretoria, 2008.	en
dc.description.abstract	The thesis discusses a series of related techniques that prepare and transform raw linguistic data for advanced processing in order to unveil hidden grammatical patterns. A threedimensional array is identified as a suitable data structure to build a data cube to capture multidimensional linguistic data in a computer's temporary storage facility. It also enables online analytical processing, like slicing, to be executed on this data cube in order to reveal various subsets and presentations of the data. XML is investigated as a suitable mark-up language to permanently store such an exploitable databank of Biblical Hebrew linguistic data. This concept is illustrated by tagging a phonetic transcription of Genesis 1:1-2:3 on various linguistic levels and manipulating this databank. Transferring the data set between an XML file and a threedimensional array creates a stable environment allowing editing and advanced processing of the data in order to confirm existing knowledge or to mine for new, yet undiscovered, linguistic features. Two experiments are executed to demonstrate possible text-mining procedures. Finally, visualisation is discussed as a technique that enhances interaction between the human researcher and the computerised technologies supporting the process of knowledge creation. Although the data set is very small there are exciting indications that the compilation and analysis of aggregate linguistic data may assist linguists to perform rigorous research, for example regarding the definitions of semantic functions and the mapping of these functions onto the syntactic module.	en
dc.description.availability	unrestricted	en
dc.description.department	Information Science	en
dc.identifier.citation	2008	en
dc.identifier.other	B23/eo	en
dc.identifier.upetdurl	http://upetd.up.ac.za/thesis/available/etd-07282008-121520/	en
dc.identifier.uri	http://hdl.handle.net/2263/26750
dc.language.iso		en
dc.publisher	University of Pretoria	en_ZA
dc.rights	©University of Pretoria 2008 B23/	en
dc.subject	Online analytical processing (olap)	en
dc.subject	Xml	en
dc.subject	Hebrew bible	en
dc.subject	Threedimensional array	en
dc.subject	Visualisation	en
dc.subject	Computational linguistics	en
dc.subject	Text data mining	en
dc.subject	Data warehousing	en
dc.subject	Database management	en
dc.subject	Round-tripping	en
dc.subject	UCTD	en_US
dc.title	Developing an XML-based, exploitable linguistic database of the Hebrew text of Gen. 1:1-2:3	en
dc.type	Thesis	en