From tags to topic maps : using marked-up Hebrew text to discover linguistic patterns

Kroeze, J.H. (Jan Hendrik); Bothma, T.J.D. (Theodorus Jan Daniel); Matthee, Machdel C.

UPSpace Home
→
Engineering, Built Environment and Information Technology
→
Informatics
→
Conference Proceedings, Papers & Presentations (Informatics)
→
View Item

dc.contributor.author	Kroeze, J.H. (Jan Hendrik)
dc.contributor.upauthor	Bothma, T.J.D. (Theodorus Jan Daniel)
dc.contributor.upauthor	Matthee, Machdel C.
dc.date.accessioned	2008-06-04T07:45:26Z
dc.date.available	2008-06-04T07:45:26Z
dc.date.issued	2008-05-18
dc.description.abstract	The paper discusses a series of related techniques that prepare and transform raw linguistic data for advanced processing in order to unveil hidden grammatical patterns. It identifies XML as a suitable mark-up language to build an exploitable data bank of multi-dimensional data in the Hebrew text of the Old Testament. This concept is illustrated by tagging a transcription of Gen. 1:1-2:3 and manipulating this data bank. Transferring the data into a three-dimensional array allows advanced processing of the data in order to either confirm existing knowledge or to mine for new, yet undiscovered, linguistic features. Visualisation is discussed as a technique that enhances interaction between the human researcher and the computerised technologies supporting this process of knowledge creation. The empirical study is a small experiment that illustrates the viability and usefulness of the proposed expert devices as well as the benefits of applying information system techniques to linguistic databases.	en
dc.format.extent	351636 bytes
dc.format.mimetype	application/pdf
dc.identifier.citation	Kroeze, JH ,Bothma, TJD, & Matthee, MC 2008, ' From tags to topic maps: using marked-up Hebrew text to discover linguistic patterns',Proceedings of the 2008 International Conference on Information Resources Management (Conf-IRM 2008),[http://www.sprott.carleton.ca/conf-irm/CFP2008.pdf]	en
dc.identifier.isbn	978-0-473-134455-7
dc.identifier.uri	http://hdl.handle.net/2263/5778
dc.language.iso	en	en
dc.publisher	Proceedings of the 2008 International Conference on Information Resources Management	en
dc.rights	Proceedings of the 2008 International Conference on Information Resources Management (Conf-IRM 2008) Niagara Falls, Ontario, Canada, 18-20 May 2008	en
dc.subject	Text data mining	en
dc.subject	Data warehousing	en
dc.subject	MOLAP	en
dc.subject	XML	en
dc.subject	Genesis	en
dc.subject.lcsh	Hebrew language -- Data processing
dc.subject.lcsh	Data mining
dc.subject.lcsh	Data warehousing
dc.subject.lcsh	XML (Document markup language)	en
dc.title	From tags to topic maps : using marked-up Hebrew text to discover linguistic patterns	en
dc.type	Article	en