AI Training

How to Fine-tune LLMs for Arabic: A Complete 2025 Guide

ALAMIA AI Research Team·March 15, 2025·12 min read

Arabic is one of the most challenging languages for large language models — and one of the most strategically important. With over 400 million speakers across 22 countries, and rapidly growing digital economies in Saudi Arabia, Qatar, UAE and North Africa, enterprises that crack Arabic NLP gain a significant competitive edge.

At ALAMIA, we've fine-tuned LLMs for Arabic across dozens of enterprise projects — from Saudi Vision 2030 government digitalization to Moroccan Darija customer service automation. This guide shares our proven methodology.

Why Arabic is Hard for LLMs

Standard LLMs are trained predominantly on English text. Arabic presents several unique challenges:

Diglossia — Modern Standard Arabic (MSA) and spoken dialects (Darija, Gulf, Egyptian) are structurally different languages used in different contexts
Right-to-left script — tokenization and attention patterns require specific handling
Rich morphology — Arabic words encode gender, number, case, and definiteness through affixes, creating vocabulary explosion
Code-switching — North African users mix French-Arabic (Darija) constantly; Gulf users mix English-Arabic
Limited training data — Arabic represents less than 3% of Common Crawl, the base training corpus for most LLMs

Step 1 — Choose the Right Base Model

Not all LLMs handle Arabic equally. Our recommendations based on production testing:

Llama 3.1 8B

★★★★☆

Best balance for Arabic fine-tuning. Strong multilingual base.

Mistral 7B v0.3

★★★★☆

Excellent for Gulf Arabic. Fast inference, great for enterprise.

Falcon 7B

★★★☆☆

Originally from UAE, good Arabic base. Less instruction-tuned.

AraGPT2

★★★★★

Arabic-native. Best for MSA generation tasks.

Step 2 — Dataset Preparation

Data quality is everything. For Arabic fine-tuning, you need:

Domain-specific corpus — at minimum 50,000 examples in your target dialect and domain
Dialect labeling — clearly separate MSA, Gulf, Darija, and code-switched examples
Instruction format — convert raw text into instruction-response pairs for supervised fine-tuning
Quality filtering — remove diacritization inconsistencies, encoding errors, and transliterated Arabic

⚠️ Common Mistake

Never mix MSA and Darija in the same fine-tuning batch without dialect labels. Models trained on mixed unlabeled data produce a hybrid that is accurate in neither dialect and confuses enterprise users.

Step 3 — Fine-tuning Configuration

Our production-tested hyperparameters for 7B models on Arabic instruction tuning:

# ALAMIA Arabic Fine-tuning Config
training_args = {
    "model_name":        "meta-llama/Meta-Llama-3.1-8B",
    "learning_rate":     2e-4,
    "num_train_epochs":  3,
    "per_device_batch":  4,
    "gradient_accum":    8,          # effective batch = 32
    "warmup_ratio":      0.03,
    "lr_scheduler":      "cosine",
    "lora_r":            64,          # higher r for Arabic morphology
    "lora_alpha":        128,
    "lora_dropout":      0.05,
    "target_modules":    ["q_proj", "v_proj", "k_proj", "o_proj"],
    "max_seq_length":    2048,
    "bf16":              True,
}

Note: We use LoRA rank 64 instead of the typical 16-32 for English. Arabic's rich morphology requires higher rank to capture the additional vocabulary patterns.

Step 4 — Evaluation for Arabic

Standard English benchmarks (MMLU, HellaSwag) are insufficient for Arabic. Use:

ALUE (Arabic Language Understanding Evaluation) — 8 tasks covering sentiment, NER, QA, and NLI
AraBench — dialect identification accuracy across 17 Arabic dialects
ARCD — Arabic Reading Comprehension Dataset for QA evaluation
Human evaluation — always test with native speakers from your target region (Gulf ≠ Moroccan)

Real Results from ALAMIA Projects

F1 score on Gulf Arabic NER

61%

Before

+28pts

89%

After

Darija sentiment accuracy

54%

Before

+30pts

84%

After

MSA legal text extraction

71%

Before

+23pts

94%

After

Code-switch handling (FR-AR)

38%

Before

+38pts

76%

After

Need Arabic AI for Your Enterprise?

ALAMIA specializes in Arabic NLP for Gulf, North Africa and Levant markets. Get a free consultation.

Get a Free Arabic AI Assessment