Differentiating between data-mining and text-mining terminology

dc.contributor.authorKroeze, J.H. (Jan Hendrik)
dc.contributor.authorMatthee, Machdel C.
dc.contributor.authorBothma, T.J.D. (Theodorus Jan Daniel)
dc.date.accessioned2007-07-25T05:41:59Z
dc.date.available2007-07-25T05:41:59Z
dc.date.issued2004-12
dc.description.abstractWhen a new discipline emerges, it usually takes some time and a great deal of academic discussion before concepts and terms become standardized. Text mining is one such new discipline. In a groundbreaking article, Untangling text data mining, Hearst tackled the problem of clarifying text-mining concepts and terminology. This article, a conceptual study, is aimed at building on Hearst's ideas by pointing out some inconsistencies and suggesting an improved and extended categorization of data-mining and text-mining techniques. A brief overview is given of the problems regarding text-mining concepts. This is followed by a summary and critical discussion of Hearst's attempt to clarify the terminology. The essence of text mining is found to be the discovery or creation of new knowledge from a collection of documents. The parameters of non-novel, semi-novel and novel investigation are used to differentiate between full-text information retrieval, standard text mining and intelligent text mining. The same parameters are also used to differentiate between related processes for numerical data and text metadata. These distinctions may be used as a road map in the evolving fields of data/information retrieval, knowledge discovery and the creation of new knowledge.en
dc.format.extent169557 bytes
dc.format.mimetypeapplication/pdf
dc.identifier.citationKroeze, H, Matthee, MC & Bothma, TJD 2004, 'Differentiating between data- and text-mining terminology', South African Journal of Information Management, vol. 6, no. 4, pp. 1-14. [http://www.sajim.co.za/]en
dc.identifier.issn1560-683X
dc.identifier.urihttp://hdl.handle.net/2263/3127
dc.language.isoenen
dc.publisherDepartment of Knowledge and Information Management, University of Johannesburgen
dc.rightsDepartment of Knowledge and Information Management, University of Johannesburgen
dc.subjectText miningen
dc.subjectKnowledge creationen
dc.subjectKnowledge discovery in databases (KDD)en
dc.subject.lcshInformation retrieval
dc.subject.lcshData mining
dc.subject.lcshKeyword searching
dc.titleDifferentiating between data-mining and text-mining terminologyen
dc.typePostprint Articleen

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Kroeze_Differentiating(2004).pdf
Size:
165.58 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.4 KB
Format:
Item-specific license agreed upon to submission
Description: