Understanding RAG project costs in production
A RAG (Retrieval-Augmented Generation) system combines a vector database with a language model to query your internal documents. Total cost breaks down into three components: initial document indexing (embeddings), input tokens consumed per query (question + retrieved context), and output tokens generated by the LLM (response). This calculator estimates all three in real time using official GPT-4o, GPT-5.5, Claude Sonnet 4.6, Claude Opus 4.6 and Mistral pricing.
Which LLM to choose for a GDPR-compliant RAG?
For French and European companies subject to GDPR, choosing an LLM goes beyond cost per token. Mistral AI, a French company, offers open-source models that can be self-hosted on sovereign infrastructure (OVHcloud, Scaleway): data never leaves the EU. Mistral Large 3 at $0.50/1M input tokens is currently the most cost-effective option for high-volume RAG with EU hosting. Claude Sonnet 4.6 and GPT-4o remain relevant for complex use cases via API, but involve data transfer to US servers — which must be contractually framed (DPA, SCC clauses).
How to reduce LLM token costs for an enterprise RAG?
Several techniques can cut your LLM bill by 3x to 10x: adaptive chunking (256 to 512 tokens per chunk depending on document density), cosine similarity threshold filtering to inject only truly relevant passages, context compression via a lightweight model before the main LLM call (RAG-Fusion pipeline), and intelligent routing that sends simple queries to Mistral Large 3 and complex ones to Claude Opus 4.6. A RAG architecture audit typically uncovers 50 to 80% savings without degrading answer quality.
Cost of a RAG chatbot on internal documents: ballpark figures
For a 50-user SMB sending 5 questions per day (roughly 7,500 queries/month), monthly RAG cost with Claude Sonnet 4.6 is around $100 using this calculator's default assumptions. With Mistral Large 3 on sovereign hosting, token API cost drops to about $14 per month, plus GPU infrastructure fees (~€30/month on OVHcloud). Initial indexing cost (embeddings) stays under one cent for a 500-page corpus and is a one-time expense.