Zeph is a Rust-native AI agent built for work that cannot fit into one chat window: coding sessions, operations, research loops, document RAG, scheduled jobs, and multi-agent workflows. It keeps short-term context sharp, persists long-term memory, builds a relationship graph from decisions and entities, and routes each task to the cheapest provider that can handle it.
Unlike single-session assistants, Zeph is designed to remember why a decision happened, not just the last messages around it.
| If you want... | Zeph gives you... |
|---|---|
| An agent that survives long projects | SQLite conversation history, semantic recall, graph memory, session digests, trajectory memory, and goal-aware compaction. |
| Lower infrastructure cost | A default SQLite vector backend, local Ollama defaults, feature-gated bundles, and provider routing for simple vs. hard tasks. |
| More than keyword memory | Typed graph facts, BFS recall, SYNAPSE spreading activation, MMR reranking, temporal decay, and write-quality gates. See graph memory concepts. |
| Provider freedom | Ollama, Claude, OpenAI, Gemini, Candle, any OpenAI-compatible endpoint, and distributed inference networks (Gonka, Cocoon TEE) for cost-sensitive or privacy-sensitive workloads. |
| Agent-grade safety | Age-encrypted vault secrets, sandboxed tool execution, MCP injection detection, SSRF guards, PII filtering, and exfiltration checks. |
| Daily operator ergonomics | CLI, TUI dashboard, MCP tools, plugins, skills, sub-agents, ACP for IDEs, A2A, scheduler, and JSON output modes. |
Install the latest release:
curl -fsSL https://github.com/bug-ops/zeph/releases/latest/download/install.sh | shOr install from crates.io:
cargo install zephInitialize and run:
zeph init
zeph doctor
zeph --tuiImportant
Zeph requires Rust 1.95 or later when building from source. Pre-built binaries do not require a Rust toolchain.
For a local-first setup, run Ollama and pull the default lightweight models:
ollama pull qwen3:8b
ollama pull qwen3-embedding
zeph init
zephLong-running agents are the worst-case workload for centralized API providers: thousands of calls per session, rate limits that pause mid-task, and costs that compound across every tool loop, memory retrieval, and sub-agent spawn.
Distributed inference networks change the economics. Compute is supplied by independent nodes rather than a single data center — which means no shared rate ceiling, no single vendor dependency, and in hardware-attested networks, provable isolation of your prompts from the node operator.
Zeph treats distributed networks as first-class providers alongside Ollama and cloud APIs, participating in the same adaptive routing — you can send cheap extraction and embedding work to a distributed node while reserving TEE-isolated compute for steps that touch sensitive context.
| Network | Provider type | Characteristic |
|---|---|---|
| Gonka | gonka / compatible |
High-capacity distributed nodes, signed transport, OpenAI-compatible gateway |
| Cocoon | cocoon |
Hardware TEE isolation — node operators cannot read prompts or weights |
Both plug into the standard provider declaration:
[[llm.providers]]
name = "distributed"
type = "gonka" # or "cocoon", or "compatible" for gateway mode
model = "qwen3-235b"
default = trueRun zeph init to configure either network interactively through the setup wizard.
Most agents treat messaging apps as a thin input channel — user sends text, agent replies. Zeph's Telegram integration flips that model: the messenger becomes a coordination layer where agents serve public audiences, accept tasks from orchestrators, and talk to other bots.
Guest Mode removes the assumption that every user is a registered Telegram account. A transparent local proxy intercepts guest queries from the Bot API 10.0 and routes them to the agent without opening a second getUpdates connection (no 409 conflicts). The agent responds via answerGuestQuery — one call, no extra infra. This makes it practical to deploy public-facing agents that handle anonymous or unauthenticated requests.
Bot-to-Bot communication lets Zeph register as a managed bot via setManagedBotAccessSettings and accept tasks from other bots in a controlled chain. Consecutive bot replies are tracked per-chat, depth is capped at max_bot_chain_depth, and each inbound bot is validated against an allowlist — so the agent participates in multi-agent pipelines without becoming a relay for arbitrary bots.
Voice input via Cocoon STT. The Telegram adapter detects voice and audio messages, downloads the file, and passes it to the configured speech-to-text provider. With type = "cocoon" and stt_model set, transcription runs inside a hardware TEE — audio bytes never leave the isolated enclave unencrypted. This makes voice-driven agentic workflows practical for sensitive use cases: a voice note becomes a task, without the audio touching a third-party transcription API.
Configurable streaming interval (stream_interval_ms, default 3 s, minimum 500 ms) fixes a silent data-loss bug in the original hardcoded delay: responses that completed within a single interval window were discarded before Telegram saw them. Now the agent flushes on completion regardless of the timer.
[telegram]
guest_mode = true
bot_to_bot = true
allowed_bots = ["orchestrator_bot", "scheduler_bot"]
max_bot_chain_depth = 3
stream_interval_ms = 1500
[[llm.providers]]
name = "stt"
type = "cocoon"
stt_model = "whisper-large-v3" # transcribes Telegram voice messages inside TEESee the Telegram guide for full configuration and Bot API 10.0 details.
Zeph combines several memory layers instead of treating recall as a side feature:
| Layer | Purpose |
|---|---|
| Working context | Keeps the current task coherent under context pressure. See context budgets. |
| Semantic memory | Stores conversations, tool outputs, documents, and summaries for retrieval. See semantic memory guide. |
| Graph memory | Records entities, decisions, relationships, causality, temporal links, and hierarchy. See graph memory. |
| Episodic memory | Preserves session-level scenes, digests, goals, and trajectories. |
| Quality gates | Reject noisy writes, validate compaction, and log retrieval failures for later improvement. See quality self-check. |
Ask "Why did we choose PostgreSQL?" and Zeph can traverse decision edges instead of searching raw chat text.
Zeph does not require a heavyweight stack to be useful:
- The default vector backend is embedded SQLite.
- Qdrant is optional for larger semantic and graph workloads.
- The default local chat model is
qwen3:8bthrough Ollama. - Feature bundles let you build only what you need:
desktop,ide,server,chat,ml, orfull. - Release builds are optimized for small native binaries.
Declare providers once in [[llm.providers]], then route work by complexity, cost, latency, and reliability:
[[llm.providers]]
name = "fast"
type = "ollama"
model = "qwen3:8b"
embedding_model = "qwen3-embedding"
embed = true
[[llm.providers]]
name = "quality"
type = "claude"
model = "claude-sonnet-4-6"
default = true
[llm]
routing = "bandit"Use local models for extraction, embeddings, routing, and summarization. Keep expensive models for planning, code generation, and expert reasoning.
Secrets live in the Zeph age vault, not in .env files or shell profiles. Tool execution goes through trust gates, command filters, sandboxing, audit logs, and redaction paths. MCP tools are discovered and exposed without dropping the injection and authorization checks.
zeph init # generate config through the wizard
zeph doctor # run preflight checks
zeph --tui # launch the dashboard
zeph ingest ./docs # ingest documents into semantic memory
zeph skill list # inspect installed skills
zeph plugin list --overlay # inspect plugin config overlays
zeph router stats # inspect adaptive provider routing
zeph memory export dump.json # export memory snapshot
zeph project purge --dry-run # preview local state cleanupcurl -fsSL https://github.com/bug-ops/zeph/releases/latest/download/install.sh | shcargo install zeph
cargo install zeph --features desktopdocker pull ghcr.io/bug-ops/zeph:latestgit clone https://github.com/bug-ops/zeph.git
cd zeph
cargo build --release --features full
./target/release/zeph init| Area | Highlights |
|---|---|
| Memory | SQLite/PostgreSQL history, embedded SQLite vectors or Qdrant, graph memory, SYNAPSE, SleepGate, APEX-MEM write-quality gates, BeliefMem probabilistic edge layer, MemCoT Zoom-In/Out recall views, document RAG. |
| Context | Goal-aware compaction, TypedPage assembler pipeline, TACO output compression, tool-output archive, session recap, active-goal injection. |
| Skills | SKILL.md registry, hot reload, BM25 + embedding matching, trust levels, self-learning skill improvement. |
| Providers | Ollama, Claude, OpenAI, Gemini, OpenAI-compatible APIs, Gonka native inference, Cocoon decentralized TEE inference, Candle local inference, adaptive routing. |
| Tools | Shell, file, web, MCP, tool quotas, approval gates, audit trail, sandboxing, output compression, speculative dispatch, ShadowSentinel safety probes, TrajectorySentinel capability governance. |
| Interfaces | CLI, TUI, Telegram (with Guest Mode and Bot-to-Bot), Discord, Slack, ACP, A2A, HTTP gateway, scheduler daemon. |
| Code intelligence | Tree-sitter indexing, semantic repo map, LSP diagnostics and hover context through MCP. |
| Observability | Debug dumps, JSONL mode, Prometheus metrics, OpenTelemetry traces, profiling builds. |
See the architecture overview and crates reference for full details.
zeph
src/ CLI, bootstrap, init wizard, command handlers
crates/zeph-core agent loop and runtime orchestration
crates/zeph-config TOML schema, migration, provider registry
crates/zeph-llm provider abstraction and model backends
crates/zeph-memory semantic, graph, episodic, and document memory
crates/zeph-skills skill registry, matching, trust, learning
crates/zeph-tools tool executors, sandboxing, policy, audit
crates/zeph-mcp MCP client and tool lifecycle
crates/zeph-tui ratatui dashboard
crates/zeph-acp IDE integration through Agent Client Protocol
crates/zeph-a2a agent-to-agent protocol support
crates/zeph-subagent sub-agent definitions, spawning, transcripts
crates/zeph-orchestration DAG planning, scheduling, verification
- Full documentation
- Installation guide
- Configuration recipes
- Memory concepts — graph, semantic, episodic layers
- Graph memory
- Adaptive inference routing
- MCP integration
- Security model
- Feature flags
- CLI reference
Zeph draws from published work on parallel tool execution, temporal knowledge graphs, agentic memory linking, failure-driven compression, retrieval quality, and multi-model routing. See References & Inspirations for the full list.
See CONTRIBUTING.md, CODE_OF_CONDUCT.md, and SECURITY.md.

