← Back to Blog
AI Training

How to Fine-tune LLMs for Arabic: A Complete 2025 Guide

ALAMIA AI Research Team·March 15, 2025·12 min read

Arabic is one of the most challenging languages for large language models — and one of the most strategically important. With over 400 million speakers across 22 countries, and rapidly growing digital economies in Saudi Arabia, Qatar, UAE and North Africa, enterprises that crack Arabic NLP gain a significant competitive edge.

At ALAMIA, we've fine-tuned LLMs for Arabic across dozens of enterprise projects — from Saudi Vision 2030 government digitalization to Moroccan Darija customer service automation. This guide shares our proven methodology.

Why Arabic is Hard for LLMs

Standard LLMs are trained predominantly on English text. Arabic presents several unique challenges:

  • Diglossia — Modern Standard Arabic (MSA) and spoken dialects (Darija, Gulf, Egyptian) are structurally different languages used in different contexts
  • Right-to-left script — tokenization and attention patterns require specific handling
  • Rich morphology — Arabic words encode gender, number, case, and definiteness through affixes, creating vocabulary explosion
  • Code-switching — North African users mix French-Arabic (Darija) constantly; Gulf users mix English-Arabic
  • Limited training data — Arabic represents less than 3% of Common Crawl, the base training corpus for most LLMs

Step 1 — Choose the Right Base Model

Not all LLMs handle Arabic equally. Our recommendations based on production testing:

Llama 3.1 8B
★★★★☆
Best balance for Arabic fine-tuning. Strong multilingual base.
Mistral 7B v0.3
★★★★☆
Excellent for Gulf Arabic. Fast inference, great for enterprise.
Falcon 7B
★★★☆☆
Originally from UAE, good Arabic base. Less instruction-tuned.
AraGPT2
★★★★★
Arabic-native. Best for MSA generation tasks.

Step 2 — Dataset Preparation

Data quality is everything. For Arabic fine-tuning, you need:

  • Domain-specific corpus — at minimum 50,000 examples in your target dialect and domain
  • Dialect labeling — clearly separate MSA, Gulf, Darija, and code-switched examples
  • Instruction format — convert raw text into instruction-response pairs for supervised fine-tuning
  • Quality filtering — remove diacritization inconsistencies, encoding errors, and transliterated Arabic
⚠️ Common Mistake

Never mix MSA and Darija in the same fine-tuning batch without dialect labels. Models trained on mixed unlabeled data produce a hybrid that is accurate in neither dialect and confuses enterprise users.

Step 3 — Fine-tuning Configuration

Our production-tested hyperparameters for 7B models on Arabic instruction tuning:

# ALAMIA Arabic Fine-tuning Config
training_args = {
    "model_name":        "meta-llama/Meta-Llama-3.1-8B",
    "learning_rate":     2e-4,
    "num_train_epochs":  3,
    "per_device_batch":  4,
    "gradient_accum":    8,          # effective batch = 32
    "warmup_ratio":      0.03,
    "lr_scheduler":      "cosine",
    "lora_r":            64,          # higher r for Arabic morphology
    "lora_alpha":        128,
    "lora_dropout":      0.05,
    "target_modules":    ["q_proj", "v_proj", "k_proj", "o_proj"],
    "max_seq_length":    2048,
    "bf16":              True,
}

Note: We use LoRA rank 64 instead of the typical 16-32 for English. Arabic's rich morphology requires higher rank to capture the additional vocabulary patterns.

Step 4 — Evaluation for Arabic

Standard English benchmarks (MMLU, HellaSwag) are insufficient for Arabic. Use:

  • ALUE (Arabic Language Understanding Evaluation) — 8 tasks covering sentiment, NER, QA, and NLI
  • AraBench — dialect identification accuracy across 17 Arabic dialects
  • ARCD — Arabic Reading Comprehension Dataset for QA evaluation
  • Human evaluation — always test with native speakers from your target region (Gulf ≠ Moroccan)

Real Results from ALAMIA Projects

F1 score on Gulf Arabic NER
61%
Before
+28pts
89%
After
Darija sentiment accuracy
54%
Before
+30pts
84%
After
MSA legal text extraction
71%
Before
+23pts
94%
After
Code-switch handling (FR-AR)
38%
Before
+38pts
76%
After

Need Arabic AI for Your Enterprise?

ALAMIA specializes in Arabic NLP for Gulf, North Africa and Levant markets. Get a free consultation.

Get a Free Arabic AI Assessment
Contact us