Designing a noun guesser for part of speech tagging in Northern Sotho

dc.contributor.authorHeid, Ulrich
dc.contributor.authorPrinsloo, Danie J. (Daniel Jacobus), 1953-
dc.contributor.authorFaab, Gertrud
dc.contributor.authorTaljard, Elsabe (Elizabeth)
dc.contributor.emaildanie.prinsloo@up.ac.zaen_US
dc.date.accessioned2009-10-29T06:20:05Z
dc.date.available2009-10-29T06:20:05Z
dc.date.issued2009
dc.description.abstractIn this article, we describe an element of a suite of computational tools for assigning word-class tags (as a preparation for part of speech (POS) tagging) to word forms in unrestricted Northern Sotho texts. POS-tagging is a step towards a linguistic analysis of the texts, which in turn allows for advanced data extraction. The tool component that is described, identifies (and classifies) noun forms. Several types of linguistic knowledge are used to recognize nouns that are not contained in the noun lexicon of the system. These include the relationship between singular and plural noun prefixes, knowledge about noun derivation, and data about the co-occurrence of the candidate with concords, pronouns and adjectives in a local context. Our implementation is a symbolic, voting-based process : together, all tests determine whether a candidate is a noun; accuracy on unseen test data is around 92%.en_US
dc.identifier.citationHeid, U, Prinsloo, DJ, Faaβ, G & Taljard, E 2009, 'Designing a noun guesser for part of speech tagging in Northern Sotho', South African Journal of African Languages, vol. 29, no. 1, pp. 1-19. [http://www.alasa.org.za/]en_US
dc.identifier.issn0257-2117
dc.identifier.urihttp://hdl.handle.net/2263/11670
dc.language.isoenen_US
dc.publisherAfrican Language Association of Southern Africaen_US
dc.rightsAfrican Language Association of Southern Africaen_US
dc.subjectNoun guesseren
dc.subjectSpeech taggingen
dc.subject.lcshGrammar, Comparative and general -- Nounen
dc.subject.lcshNorthern Sotho language -- Nounen
dc.subject.lcshLinguistic analysis (Linguistics)en
dc.titleDesigning a noun guesser for part of speech tagging in Northern Sothoen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Heid_Designing(2009).pdf
Size:
2.78 MB
Format:
Adobe Portable Document Format
Description:
Article

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.43 KB
Format:
Item-specific license agreed upon to submission
Description: