- π About
- βοΈ Requirements
- β¨ Features
- π§ Configuration
- π» CLI Usage
- π¦ Installation
- π Compatibility
- β FAQ
- π¬ Community & Support
- π License
β οΈ Disclaimer
AI Code Assistant (Offline) is a local-first, privacy-focused code completion and refactoring tool for developers. Powered by a quantized 7B LLM running entirely on your machine, it provides intelligent suggestions without cloud dependency. Ideal for air-gapped environments, sensitive codebases, or latency-sensitive workflows.
- Windows 10/11 (64-bit)
- 16GB RAM (32GB recommended for optimal performance)
- 8GB free disk space (SSD preferred)
- NVIDIA GPU with CUDA 12.2+ (optional but highly recommended)
- No internet required after initial download
- Local LLM Inference π§ β Runs a quantized 7B parameter model entirely offline with 4-bit precision. No API calls, no telemetry.
- Multi-Language Support π§ β Native support for Python, JavaScript, TypeScript, Java, C++, Go, Rust, and C# with language-aware completions.
- Context-Aware Suggestions π§ β Understands project structure, imports, and recent edits to provide relevant completions (up to 128k context window).
- Privacy-Preserving π§ β All data stays on your machine. No cloud processing, no data logging.
- Custom Model Support π§ β Import GGUF/GGML models or fine-tune with your own dataset (via CLI).
- IDE Integration π§ β Plugins for VS Code, JetBrains (IntelliJ, PyCharm), and Neovim with real-time suggestions.
- Static Analysis π§ β Detects anti-patterns, security vulnerabilities, and performance bottlenecks in real-time.
- Offline Documentation π§ β Bundled docs for Python, JavaScript, and C++ with instant lookup (no internet required).
Configure via config.json (auto-generated on first launch):
{
"model": {
"path": "models/codellama-7b.Q4_K_M.gguf",
"gpu_layers": 40,
"context_length": 128000
},
"completion": {
"max_tokens": 128,
"temperature": 0.2,
"top_p": 0.95
},
"ide": {
"vscode": {
"enabled": true,
"hotkey": "Ctrl+Alt+C"
},
"jetbrains": {
"enabled": false
}
}
}# Start the local inference server (default port: 8080)
aica-server --model models/codellama-7b.Q4_K_M.gguf --gpu-layers 40
# Generate completions (REST API)
curl -X POST http://localhost:8080/v1/completions \
-H "Content-Type: application/json" \
-d '{"prompt": "def fibonacci(n):", "max_tokens": 50}'- Go to the Releases page and download the latest version.
- Extract the archive using 7-Zip or WinRAR.
- Run the executable as Administrator.
- Follow the on-screen setup steps (model download may take 5-10 minutes).
- Launch your IDE and enable the plugin.
| OS | Version | Status | Notes |
|---|---|---|---|
| Windows 11 | 22H2+ | β | Best performance |
| Windows 10 | 1909+ | β | Requires latest updates |
| Windows 7 | SP1 | Limited GPU support | |
| Linux | WSL2 | Experimental (no GPU passthrough) | |
| macOS | - | β | Not supported |
Q: Is this detectable by IDEs or anti-virus? A: The tool runs as a local process with no network calls after setup. Detection risk is minimal if used responsibly (avoid automated mass refactoring in sensitive environments).
Q: How often are models updated? A: New model versions are released quarterly. Check Releases for updates.
Q: I get "CUDA out of memory" errors. What should I do?
A: Reduce gpu_layers in config.json or use a smaller model (e.g., 3B parameter variant).
MIT License. Copyright (c) 2026 AI Code Assistant (Offline) Contributors.
This tool is for educational and development purposes only. The developers are not responsible for any misuse or damage caused by this software. Use at your own risk.