Abstract:
For centuries, dictionaries were compiled based upon the knowledge of the lexicographer and information retrieved from manually consulted sources, mainly through a process of reading and marking. This approach meant that much of the information used in the dictionary relied upon the knowledge of the lexicographer. It is vital to rely on the lexicographer’s knowledge of the language but this has its shortcomings, since there is no single individual who knows all the words or terms, their meanings and usage, the words they combine with, and so on, in a specific language. The utilization of this method left room for errors and omissions because the lexicographer could easily overlook some words due to factors like time, fatigue, limited knowledge of the lexicographer, etc. Important words, for example words likely to be looked for by the target users of the dictionary, could accidentally be omitted. In the 1980s, the corpus era was born and the lexicography field changed forever. Collins COBUILD in Birmingham spearheaded this era with the publication of the first corpus-based dictionary, the Collins COBUILD Dictionary in 1987. Since the corpus era began, lexicographers no longer rely solely on their knowledge of the language, intuition, or the limited information gathered from available written sources, which are very limited for African languages. The corpus allows the lexicographer to have access to huge volumes of authentic data from written texts and transcribed oral data. This research will therefore critically discuss dictionary compilation for Sesotho and spearhead the use of corpora in the compilation of Sesotho dictionaries, so that lexicographers do not compile dictionaries as if they are compiling the first dictionary for the language. In addition, they should take into account tasks like lexicographic planning, amongst other factors required to compile a good user-friendly dictionary.
Key words
Corpora, collocations, concordances, lexicography, lexicographical planning, microstructure, macrostructure, lemmatisation.