Vietnamese legal RAG done right: citations, effective dates, and no hallucinations
Why generic RAG fails on Vietnamese law, the chunk-by-Article + hybrid-search + domain-tuned embeddings pipeline that's working in 2026, and the eval discipline that keeps the system trustworthy.

Querying Vietnamese law is a much harder RAG problem than it looks: layered legislation, constant amendments, and a specialised legal register. This article distills how to build a legal RAG that actually works — not a "law Q&A" demo that breaks after 10 questions.
1. Why Vietnamese legal RAG is hard
- One rule may live across a Law + Decree + Circular + guiding Official Letter — the answer must surface all four layers.
- Amendments don't delete the original — the RAG must know which clause is still in force.
- Citations must be exact down to Article/Clause/Point; one wrong character makes it legally worthless.
- Generic multilingual embeddings perform poorly on Vietnamese legal terminology.
2. The pipeline that's working at larger organisations
The architecture in use across bank legal teams and insurers in 2026:
- Chunking by Article/Clause, not by tokens. Each chunk carries metadata: document number, type, issue date, expiry, amending docs.
- Hybrid search: BM25 (for exact terms) + dense embeddings (for natural phrasing), reranked by a cross-encoder.
- Domain-tuned embeddings: fine-tuned on "business question → correct clause" pairs labelled by in-house lawyers (~3,000 pairs is enough to move the needle).
- Effective-date filter: every query carries an as-of date; expired clauses are filtered out.
- Citation enforcement: the LLM is not allowed to answer unless it can cite an Article/Clause from a retrieved chunk.
3. Evaluation: no evals, no legal RAG
Build an eval set of 200–500 real business questions with lawyer-approved answers. Track three metrics: (i) recall@k — does the right document appear in the top-k; (ii) citation accuracy — does the LLM cite the right Article/Clause; (iii) factual answer accuracy. A serious system must clear >90% on all three before internal rollout.
4. Use cases with the clearest ROI in 2026
- Bank legal teams: SBV circulars, AML/KYC under Decree 13.
- Insurers: reconciling policy clauses against the amended Insurance Business Law.
- Enterprise HR: Labour Code + Decrees/Circulars on social insurance and PIT.
- Public sector: sector-specific corpora (construction, land, public investment).
5. Warning: never let the LLM "be creative" on legal text
Temperature must be 0 or near it. The prompt must force the LLM to answer "not found" when retrieval lacks grounding — silence beats invented clauses. Hallucinated law is a real legal liability, not a UX bug.
Evaluating a similar solution?
Our team can advise on architecture, rollout roadmap and TCO — first session free, no commitment.


