This article is the first in a trilogy that deals with corpus-driven Bantu lexicography, which is illustrated for Lusoga. The focus here is on the building of a so-called 'organic corpus' from scratch, while the next two instalments will deal with the use of that corpus on the macro-structural and microstructural levels, respectively. Not many detailed descriptions of corpus-building efforts exist for Bantu languages, so each and every step is discussed in detail, paying particular attention to the parameters that have to be taken into account, while not losing sight of the need to log the metadata either.
Olupapula luno n'olusooka ku isatu edhinaayogela ku musomo gw'omutengeso gw'eitu ogukozesebwa mu namawanika w'ennimi dha Bantu nga gulaga omulimu ogw'akolebwa ku Lusoga. Mu lupapula luno, eisila lili ku nzimba ya itu namukyukilo okuva ku ntandiiko. Ebitundu ebinaaba mu lupapula olw'okubili n'olw'okusatu biidha kugema ku nkozesa ya itu lino ku isa ly'omutindiigo ogw'ebizimbibwa mu mutegeko n'eisa elilaga eitu lino mu mwoleko ogw'azimbibwa mu mutindiigo n'engeli omusingi ogulimu bwe gulagibwa mu iwanika. Mu nnimi dha Bantu, emilimu egilaga omusingi guno tigitela kuwandiikibwaku mu butongole okusobola okumanhisa abo abayinza okuba nga bagasibwa. Kale buli kitundu ekiteesebwaku mu nnambika eli mu mpapula eisatu dhino kitoolayo buli kanhomelo ka bukodyo n'emitendela egy'agobelebwa ela gy'akozesebwa mu kusenvula omulimu gw'okuzimba omutimbo gw'ekyebungo ky'olulimi Olusoga gwonagwona.