Vukosi Marivate’s talk (link) covered a range of work — including research, development, and community-building — related to improving support for low-resource languages, with a particular focus on those spoken in the African continent.
He made it clear that creating chatbots in African languages wasn’t necessarily the right path — pointing out that for many of these low-resource languages, the availability of reliable and accurate digital dictionaries and thesauruses would already be a useful step.
Highlighted work included generating sentence-pair datasets between low-resource languages (bypassing the need to go via English for translation), as well as text augmentation tools and dataset curation work.
Links to mentioned resources:
- Deep Learning Indaba — ML meetings across the African continent. (next one is 1-7 September 2024, in Dakar)
- Masakhane — a grassroots NLP research community for African languages
- Lelapa AI — Vukosi Marivate’s Africa-centric AI research lab
- Data Workers’ Inquiry — a community-based research project on data workers’ workplaces