Parvess, Jesse
(University of Pretoria, 2023)
It was researched whether a multilingual Bantu pretraining corpus could be created from
freely available data. Here, to create the dataset, Bantu text extracted from datasets that
are freely available online (mainly from ...