Tutorial: Aligning LLMs to Low-Resource Languages

Abstract

This tutorial provides a detailed guide on collecting data for aligning large language models (LLMs) with low-resource languages (LRLs).
It addresses the challenge of data scarcity in these languages and introduces a pipeline for generating high-quality data, using Swahili as a primary example. The tutorial covers strategies for dataset collection and alignment of LLMs to LRLs, offering comprehensive guidance on producing and utilizing high-quality data for language technology development in under-resourced languages.

Notebooks

AYA_Notebook
LRL_Notebook

Organizers

Ahmet U ̈stu ̈n — Ahmet Üstün
Cohere for AI

Tutorial
Aligning LLMs to Low-Resource Languages

Abstract

Materials

Notebooks

Organizers