Local LLMs and GDPR: agentic coding without data leaks
June 5, 2026 · 7 min read · Guides
AI Engineer — UTT 4th year · LLM, RAG & GDPR compliance specialist · 15+ client projects
Cloud AI API prices are rising. Google tripled Gemini Flash pricing between versions 2.5 and 3.5, with marginal performance gains. OpenAI now bills for every token of your source code sent with each completion. For teams doing agentic coding (agents that read files, generate tests, refactor code), the bill can grow quickly.
Direct answer: a local LLM runs entirely on your machine or server, no data is transmitted to an external provider — it is the most sovereign option. For cases requiring frontier model performance, Mistral AI (EU infrastructure, French company) is the GDPR-native cloud alternative. This guide covers both.
Why cloud agentic coding creates a GDPR problem
When you use Claude Code, GitHub Copilot, or Cursor in cloud mode, every request sends:
- The content of files read by the agent
- Conversation history (often 10k-100k tokens)
- Tool execution results (test outputs, logs, etc.)
This data goes to American servers. Article 44 of the GDPR prohibits personal data transfers outside the EU without adequate safeguards. The CLOUD Act (2018) allows US authorities to access data held by US companies, including on their European infrastructure. If your code contains personal data (customer names, emails, database schemas), this is a real risk.
A local LLM eliminates this risk structurally: nothing leaves your infrastructure.
What hardware configuration is sufficient?
You do not need a dedicated GPU server to get started. A decent laptop is enough for 7B to 27B parameter models in GGUF quantization.
| Configuration | Usable model | Use case |
|---|---|---|
| 16 GB RAM, modern CPU | Gemma 4 4B-it, Llama 3.2 3B | Testing, prototypes |
| 32 GB RAM, CPU/integrated GPU | Gemma 4 12B-it | Day-to-day coding |
| 64 GB RAM, AMD/Apple Silicon | Gemma 4 27B-it or 26B MoE | Production, agents |
| NVIDIA GPU 24 GB VRAM | Qwen2.5-Coder 32B | High performance |
For real-world agentic coding, the recommended model is Gemma 4 26B Mixture of Experts: it supports tool use, vision, and reasoning, with a good quality/resource ratio.
Setup with Ollama or LM Studio
Ollama option (terminal, scripts)
# Installation
curl -fsSL https://ollama.com/install.sh | sh
# Download the model
ollama pull gemma4:27bimport ollama
def coding_agent(task: str, file_content: str) -> str:
response = ollama.chat(
model="gemma4:27b",
messages=[
{
"role": "system",
"content": "You are an expert coding assistant. Analyze and modify the provided code."
},
{
"role": "user",
"content": f"Task: {task}\n\nCode:\n```\n{file_content}\n```"
}
]
)
return response["message"]["content"]LM Studio option (graphical interface)
LM Studio is recommended for getting started: simple interface, integrated model management, one-click OpenAI-compatible server. Available at lmstudio.ai.
Once the local server is running (default port 1234), any OpenAI client works:
from openai import OpenAI
# Client pointing to the local server
client = OpenAI(base_url="http://localhost:1234/v1", api_key="local")
response = client.chat.completions.create(
model="gemma-4-27b-it",
messages=[{"role": "user", "content": "Refactor this function..."}],
max_tokens=4096
)Critical configuration: context length and cache
Default parameters in most local LLM runners are too conservative for agentic coding.
{
"contextLength": 100000,
"kCacheQuantization": "Q8_0",
"vCacheQuantization": "Q4_0",
"maxInputTokens": 64000,
"maxOutputTokens": 16384
}- Context length: raise to at least 100k tokens. Coding agents send entire files, conversation histories, test outputs.
- Cache quantization: K-cache at Q8_0 and V-cache at Q4_0 reduces VRAM usage by 30-40% with negligible quality impact.
Without these adjustments, the first message from an agent (with system prompt + tool definitions + file context) can reach 8k-20k tokens and crash inference.
VS Code Copilot integration
VS Code Copilot supports custom endpoints since version 1.95. Configuration in settings.json:
{
"chat.extensionServiceUrl": "http://localhost:1234/v1",
"chat.models": [
{
"id": "gemma-4-27b-it",
"name": "Gemma 4 27B (local)",
"maxInputTokens": 64000,
"maxOutputTokens": 16384,
"isDefault": true,
"capabilities": {
"vision": true,
"toolCalling": true,
"reasoning": true
}
}
]
}Important note: the first prompt sent to Copilot is heavy (Copilot system prompt + all tool definitions). Expect 2-5 minutes for the first response while the model loads context. Subsequent exchanges are significantly faster.
When local LLMs fall short
Local models remain less capable than frontier models (GPT-4o, Claude Opus, Gemini Ultra) on some tasks:
- Complex code generation: multi-file codebase refactoring with precise architectural constraints
- Long reasoning chains: problems requiring more than 10 chained reasoning steps
- Rare languages: Elixir, Haskell, advanced Rust — local models have less training data
For these cases, the most GDPR-solid cloud alternative is Mistral AI (La Plateforme): a French company, EU-hosted infrastructure, outside the reach of the US CLOUD Act. Mistral Large and Codestral are competitive with frontier models on coding tasks. This is the option to favor when you need cloud performance while staying within a robust GDPR framework.
For less sensitive code, OpenRouter in zero-data-retention mode remains an option: data is not stored or used for training, though hosting remains outside the EU.
TL;DR
A local LLM (Ollama + Gemma 4 27B) remains the most sovereign solution for GDPR-compliant agentic coding: no data leaves your machine, zero marginal cost, functional on a good laptop with 64 GB RAM.
When local performance falls short, Mistral AI (Codestral, Mistral Large) is the GDPR-native cloud alternative: EU infrastructure, outside the reach of the US CLOUD Act.
The minimum viable local setup: LM Studio + Gemma 4 27B + 100k context + VS Code integration.
Working on an agentic coding project with data sovereignty requirements? Let's discuss your architecture.
About the author
Pierre Kasparian4th-year engineering student at UTT (University of Technology of Troyes) and AI integration freelancer. He deploys LLMs, RAG pipelines, and AI agents for French and European companies, with strong expertise in GDPR compliance and European hosting. 15+ client projects, including Pretto and LiveSession.