All Insights
AI & ComplianceMay 2026·9 min read

Vietnamese legal RAG done right: citations, effective dates, and no hallucinations

Why generic RAG fails on Vietnamese law, the chunk-by-Article + hybrid-search + domain-tuned embeddings pipeline that's working in 2026, and the eval discipline that keeps the system trustworthy.

Querying Vietnamese law is a much harder RAG problem than it looks: layered legislation, constant amendments, and a specialised legal register. This article distills how to build a legal RAG that actually works — not a "law Q&A" demo that breaks after 10 questions.

1. Why Vietnamese legal RAG is hard

  • One rule may live across a Law + Decree + Circular + guiding Official Letter — the answer must surface all four layers.
  • Amendments don't delete the original — the RAG must know which clause is still in force.
  • Citations must be exact down to Article/Clause/Point; one wrong character makes it legally worthless.
  • Generic multilingual embeddings perform poorly on Vietnamese legal terminology.

2. The pipeline that's working at larger organisations

The architecture in use across bank legal teams and insurers in 2026:

  • Chunking by Article/Clause, not by tokens. Each chunk carries metadata: document number, type, issue date, expiry, amending docs.
  • Hybrid search: BM25 (for exact terms) + dense embeddings (for natural phrasing), reranked by a cross-encoder.
  • Domain-tuned embeddings: fine-tuned on "business question → correct clause" pairs labelled by in-house lawyers (~3,000 pairs is enough to move the needle).
  • Effective-date filter: every query carries an as-of date; expired clauses are filtered out.
  • Citation enforcement: the LLM is not allowed to answer unless it can cite an Article/Clause from a retrieved chunk.

3. Evaluation: no evals, no legal RAG

Build an eval set of 200–500 real business questions with lawyer-approved answers. Track three metrics: (i) recall@k — does the right document appear in the top-k; (ii) citation accuracy — does the LLM cite the right Article/Clause; (iii) factual answer accuracy. A serious system must clear >90% on all three before internal rollout.

4. Use cases with the clearest ROI in 2026

  • Bank legal teams: SBV circulars, AML/KYC under Decree 13.
  • Insurers: reconciling policy clauses against the amended Insurance Business Law.
  • Enterprise HR: Labour Code + Decrees/Circulars on social insurance and PIT.
  • Public sector: sector-specific corpora (construction, land, public investment).

5. Warning: never let the LLM "be creative" on legal text

Temperature must be 0 or near it. The prompt must force the LLM to answer "not found" when retrieval lacks grounding — silence beats invented clauses. Hallucinated law is a real legal liability, not a UX bug.

DigiWorkHub Advisory

Evaluating a similar solution?

Our team can advise on architecture, rollout roadmap and TCO — first session free, no commitment.

Apply this to your business

Want to go deeper on ai & compliance?

Book a 20-min call with the DigiWorkHub team. We'll be direct — which architecture fits, what budget, what timeline.

ZaloContact