Corpus-based lexicography for lesser-resourced languages - maximizing the limited corpus

dc.contributor.authorPrinsloo, Danie J. (Daniel Jacobus), 1953-
dc.contributor.emaildanie.prinsloo@up.ac.zaen_ZA
dc.date.accessioned2015-12-03T08:48:38Z
dc.date.available2015-12-03T08:48:38Z
dc.date.issued2015
dc.description.abstractThis article focuses on lesser-resourced languages for which only very limited corpora are available and how such relatively small and often unbalanced, raw corpora could be maximally utilized for lexicographic purposes to obtain similar results as for bigger corpora. Sepedi and Afri-kaans will be studied in this regard. The aim is to determine to what extent enlarging a corpus from e.g. one to 10 million, and from 10 million to 100 million words enhances its potential for (a) macro-structure compilation, (b) sourcing information on the most important microstructural aspects and (c) the creation of lexicographic tools. It will be argued that valuable and even sufficient data for the compilation of a specific dictionary can be extracted from a relatively small corpus of approxi-mately one million words but that "bigger" in some instances indeed means "better".en_ZA
dc.description.abstractDie fokus in hierdie artikel is op hulp-bronbeperkte tale waarvoor slegs baie beperkte korpusse beskikbaar is en hoe sodanige relatief klein en dikwels ongebalanseerde, rou korpusse maksimaal benut kan word vir leksikografiese doeleindes om soortgelyke resultate as van groter korpusse te verkry. Sepedi en Afrikaans, word in hierdie verband bestudeer. Die doel is om te bepaal tot watter mate die vergroting van 'n korpus van byvoorbeeld een na 10 miljoen, en van 10 miljoen na 100 miljoen woorde die potensiaal sal ver-hoog vir (a) makrostruktuur samestelling, (b) die inwin van inligting omtrent die belangrikste mikrostrukturele aspekte en (c) die ontwerp van leksikografiese hulpmiddels. Daar sal aangevoer word dat waardevolle en selfs voldoende data vir die samestelling van 'n spesifieke woordeboek onttrek kan word uit 'n relatief klein korpus van ongeveer een miljoen woorde maar dat "groter" wel in sekere omstandighede "beter" is.en_ZA
dc.description.librarianam2015en_ZA
dc.description.librarianmz2025en
dc.description.sdgSDG-09: Industry, innovation and infrastructureen
dc.description.sponsorshipA grant from the German Ministry for Education and Research and supported in part by the National Research Foundation of South Africa (Grant specific unique reference number (UID) 85763).en_ZA
dc.description.urihttp://lexikos.journals.ac.zaen_ZA
dc.description.urihttp://www.wat.co.za/index.php/en/publications/lexikosen_ZA
dc.identifier.citationPrinsloo, DJ 2015, 'Corpus-based lexicography for lesser-resourced languages - maximizing the limited corpus', Lexikos, vol. 25, pp. 285-300.en_ZA
dc.identifier.issn1684-4904 (print)
dc.identifier.issn2224-0039 (online)
dc.identifier.urihttp://hdl.handle.net/2263/51037
dc.language.isoenen_ZA
dc.publisherBuro van die WATen_ZA
dc.rightsBuro van die WATen_ZA
dc.subjectCorpus-based lexicographyen_ZA
dc.subjectLesser-resourced languagesen_ZA
dc.subjectLimited corporaen_ZA
dc.subjectCorpus toolsen_ZA
dc.subjectLexicographic toolsen_ZA
dc.subjectKorpusgebaseerde leksikografieen_ZA
dc.subjectHulpbronbeperkte taleen_ZA
dc.subjectBeperkte korpusseen_ZA
dc.subjectKorpusgereedskapen_ZA
dc.subjectLeksikografiese hulpmiddelsen_ZA
dc.subject.otherHumanities articles SDG-09
dc.subject.otherSDG-09: Industry, innovation and infrastructure
dc.titleCorpus-based lexicography for lesser-resourced languages - maximizing the limited corpusen_ZA
dc.title.alternativeKorpusgebaseerde leksikografie vir hulpbronbeperkte tale - die maksimalisering van die beperkte korpusen_ZA
dc.typeArticleen_ZA

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Prinsloo_Corpusbased_2015.pdf
Size:
509.11 KB
Format:
Adobe Portable Document Format
Description:
Article

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: