⬡ SOVEREIGN AI INFRASTRUCTURE STANDARD ⬡
Drop-in sovereign replacement for public AI platforms — air-gapped, OASA-compliant, OpenAI-compatible.
# 1. One-command install (auto-detects GPU, RAM, TPM)
curl -sSL https://install.sovereignstack.ai | bash
# 2. Download a local model
curl -Lo playground/models/model.gguf \
https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf
# 3. Launch the stack
docker compose up --build -d
# 4. Chat (OpenAI-compatible API — just change the base URL)
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer mock-valid-token" \
-d '{
"model":"Qwen/Qwen2.5-7B-Instruct",
"messages":[{"role":"user","content":"What is digital sovereignty?"}],
"oasa_compliance_lock":true
}'# Or from any OpenAI client — just change three lines:
import openai
openai.api_key = "mock-valid-token" # was: sk-...
openai.base_url = "http://localhost:8080/v1" # was: https://api.openai.com/v1
openai.default_headers = {"oasa_compliance_lock": "true"} # was: nothingThat's it. Zero data leaves your network. No API tokens. No cloud dependency.
| Model | Quantization | VRAM | Tokens/sec | TTFT | Hardware |
|---|---|---|---|---|---|
| Llama 3.1 8B | INT4 AWQ | 8 GB | 142 tok/s | 45ms | RTX 4090 |
| Llama 3.1 70B | INT4 AWQ | 28 GB | 39 tok/s | 120ms | 2x RTX 6000 |
| Mistral 7B | INT4 GGUF | 6 GB | 68 tok/s | 55ms | RTX 3090 |
| Qwen 2.5 7B | INT4 AWQ | 8 GB | 134 tok/s | 48ms | RTX 4090 |
| Phi-3 Mini | INT4 GGUF | 4 GB | 22 tok/s | 95ms | CPU-only (M3) |
| DeepSeek-Coder 33B | INT4 AWQ | 18 GB | 56 tok/s | 88ms | A100 40GB |
Benchmarks run with tools/benchmark.py on isolated hardware. See Benchmarking Guide.
| Scenario | Public Cloud | SovereignStack | Savings |
|---|---|---|---|
| 10 users, GPT-4 class | $360K | $12K (RTX 4090) | 97% |
| 50 users, GPT-4 class | $1.8M | $45K (2x A100) | 97.5% |
| 200 users, mixed models | $7.2M | $150K (4-node cluster) | 98% |
┌──────────────────────────────────┐
│ CLIENT APPLICATION │
│ OpenAI SDK / LangChain / Custom │
└──────────────┬───────────────────┘
│
┌──────────▼──────────┐
│ SOVEREIGN GATEWAY │
│ :8080 — OIDC + OPA │
│ Auth → Policy → Audit│
└──────────┬──────────┘
│
┌──────────────────────────┼──────────────────────────┐
│ │ │
┌────────▼────────┐ ┌─────────▼────────┐ ┌─────────▼────────┐
│ vLLM ENGINE │ │ MEMORY SERVICE │ │ INGEST SERVICE │
│ PagedAttention │ │ TurboMemory │ │ pdf2struct │
│ INT4/AWQ/FP8 │ │ AES-256 Vector DB │ │ PDF/DOCX → JSON │
│ FlashAttention │ │ KV Cache Isolation│ │ VOLATILE RAM Only │
└────────┬────────┘ └─────────┬────────┘ └─────────┬────────┘
│ │ │
└──────────────────────────┼──────────────────────────┘
│
┌──────────▼──────────┐
│ IDENTITY & ACCESS │
│ Keycloak (OIDC) │
│ Open Policy Agent │
│ OpenTelemetry │
│ Prometheus │
└─────────────────────┘
When local compute fails, the oasa_compliance_lock ensures a 503 Service Unavailable is returned rather than silently forwarding data to external APIs. A 503 is inconvenient; a GDPR fine of 4% annual revenue is catastrophic.
SovereignStack implements the Open Architecture for Sovereign AI (OASA) — a three-tier conformance certification program:
Full certification specification →
# Get a token from Keycloak
curl -X POST http://localhost:8083/realms/sovereign/protocol/openid-connect/token \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "client_id=sovereign-gateway" \
-d "username=sovereign-admin" \
-d "password=admin123" \
-d "grant_type=password"
# Use the token
curl http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer <access_token>" \
-H "Content-Type: application/json" \
-d '{"model":"Qwen/Qwen2.5-7B-Instruct","messages":[{"role":"user","content":"Hello"}],"oasa_compliance_lock":true}'Role-based access: inference:write, inference:read, audit:read — enforced at the gateway.
Data Loss Prevention, prompt injection blocking, and role-based model budgets — all governed by Open Policy Agent Rego policies at policies/inference.rego.
- OpenTelemetry — Trace propagation across all services (
x-trace-id,x-span-id) - Prometheus — Metrics scraping at
/metricson all services - Audit Log — Immutable append-only JSON log with jurisdiction tags
# Docker Compose (all traffic on internal: true bridge)
docker compose up --build -d
# Kubernetes with Helm (strict NetworkPolicies, gVisor sandboxing)
helm install sovereign-stack ./charts/sovereignstack \
--namespace sovereign-stack --create-namespace \
--set vllm.model.name="Qwen/Qwen2.5-7B-Instruct" \
--set global.air_gapped=true| Document | Description |
|---|---|
| ARCHITECTURE.md | System topology, layers, subsystems, trust boundaries |
| CONFORMANCE.md | OASA certification specification (L1/L2/L3) |
| OASA.md | Full OASA protocol specification |
| Architecture Guide | Trust boundaries, data flow, identity flow |
| Deployment Guide | Deployment profiles, Docker Compose, Helm |
| Deployment Profiles | Personal, Edge, Air-Gapped, Datacenter |
| Docker Compose | Local stack with Keycloak, vLLM, OTel, Prometheus |
| Helm Chart | Kubernetes deployment with NetworkPolicies & gVisor |
| Threat Model | STRIDE threat catalogue, attack surface, compliance mapping |
| RFCs | Standards evolution (Runtime Spec, RFC Process) |
| Specifications | Formal subsystem and protocol specifications |
| Governance | Project roles, decision-making, release model |
| Roadmap | Development phases and milestones |
| Contributing | How to contribute |
| Security | Vulnerability disclosure, threat model, compliance |
| Code of Conduct | Community standards |
# VRAM estimation
python tools/vram_calculator.py --params 70B --quant INT4 --context 8192
# Compliance validation
python tools/sovereign_stack.py validate sovereign-stack.yaml --audit-host
# Performance benchmarking
python tools/benchmark.py --url http://localhost:8080/v1 --model sovereign-llama3
# Runtime exfiltration watchdog
python tools/runtime_shield.py --interval 10
# Compliance report generator
python tools/generate_compliance_report.py --level L2 --output report.md| Regulation | Jurisdiction | Coverage |
|---|---|---|
| GDPR | EU | Zero exfiltration, jurisdictional routing, immutable audit logs |
| HIPAA | US | AES-256-GCM encryption, air-gapped compute, access logging |
| NIS2 | EU | Hardware security (TPM), immutable audit trail, incident response isolation |
| EU AI Act | EU | Local model control, transparency logging, human oversight |
| DORA | EU | Operational resilience via air-gapped orchestration |
| SOX | US | Deterministic ingestion, tamper-evident logs, financial data isolation |
SovereignStack is structured for enterprise adoption:
- Enterprise Support (SLA) — 24/7 incident response, deployment audits, custom integration
- SovereignNode Appliances — Turnkey air-gapped hardware with K3s, vLLM, encrypted Qdrant
- OASA Certification — Compliance badges and third-party audit reports
- Dedicated Training — On-site workshops for regulated deployments
See ROADMAP.md for the full phased roadmap through 2027+.
| Phase | Highlights |
|---|---|
| 2026.1 ✅ | Helm chart, vLLM, OPA, CI/CD, Keycloak OIDC, Threat Model |
| 2026.2 🚧 | RFC Process, Architecture Docs, Deployment Profiles, Governance |
| 2026.3 📅 | Merkle-Tree Auditing, Federated Memory, Mesh Networking |
| 2027.1 📅 | Hardware Enclaves, SBOM/Cosign, SPIFFE/SPIRE, Sovereign Node OS |
| 2027.2 📅 | Agent Orchestration, Multi-Model Routing, Federated Agents |
| 2027.3+ 🔮 | Autonomous Infrastructure, Certification Program, Enterprise Platform |
See CONTRIBUTING.md and our Code of Conduct.
Apache 2.0 — see LICENSE.