Skip to content

bug-ops/zeph

Zeph

Zeph

A memory-first AI agent for long-running work on local, cloud, and decentralized inference.

Crates.io docs CI codecov MSRV Tests License: MIT

Zeph is a Rust-native AI agent built for work that cannot fit into one chat window: coding sessions, operations, research loops, document RAG, scheduled jobs, and multi-agent workflows. It keeps short-term context sharp, persists long-term memory, builds a relationship graph from decisions and entities, and routes each task to the cheapest provider that can handle it.

Unlike single-session assistants, Zeph is designed to remember why a decision happened, not just the last messages around it.

Why Try Zeph

If you want... Zeph gives you...
An agent that survives long projects SQLite conversation history, semantic recall, graph memory, session digests, trajectory memory, and goal-aware compaction.
Lower infrastructure cost A default SQLite vector backend, local Ollama defaults, feature-gated bundles, and provider routing for simple vs. hard tasks.
More than keyword memory Typed graph facts, BFS recall, SYNAPSE spreading activation, MMR reranking, temporal decay, and write-quality gates. See graph memory concepts.
Provider freedom Ollama, Claude, OpenAI, Gemini, Candle, any OpenAI-compatible endpoint, and distributed inference networks (Gonka, Cocoon TEE) for cost-sensitive or privacy-sensitive workloads.
Agent-grade safety Age-encrypted vault secrets, sandboxed tool execution, MCP injection detection, SSRF guards, PII filtering, and exfiltration checks.
Daily operator ergonomics CLI, TUI dashboard, MCP tools, plugins, skills, sub-agents, ACP for IDEs, A2A, scheduler, and JSON output modes.

Quick Start

Install the latest release:

curl -fsSL https://github.com/bug-ops/zeph/releases/latest/download/install.sh | sh

Or install from crates.io:

cargo install zeph

Initialize and run:

zeph init
zeph doctor
zeph --tui

Important

Zeph requires Rust 1.95 or later when building from source. Pre-built binaries do not require a Rust toolchain.

For a local-first setup, run Ollama and pull the default lightweight models:

ollama pull qwen3:8b
ollama pull qwen3-embedding
zeph init
zeph

Distributed Inference

Long-running agents are the worst-case workload for centralized API providers: thousands of calls per session, rate limits that pause mid-task, and costs that compound across every tool loop, memory retrieval, and sub-agent spawn.

Distributed inference networks change the economics. Compute is supplied by independent nodes rather than a single data center — which means no shared rate ceiling, no single vendor dependency, and in hardware-attested networks, provable isolation of your prompts from the node operator.

Zeph treats distributed networks as first-class providers alongside Ollama and cloud APIs, participating in the same adaptive routing — you can send cheap extraction and embedding work to a distributed node while reserving TEE-isolated compute for steps that touch sensitive context.

Network Provider type Characteristic
Gonka gonka / compatible High-capacity distributed nodes, signed transport, OpenAI-compatible gateway
Cocoon cocoon Hardware TEE isolation — node operators cannot read prompts or weights

Both plug into the standard provider declaration:

[[llm.providers]]
name = "distributed"
type = "gonka"   # or "cocoon", or "compatible" for gateway mode
model = "qwen3-235b"
default = true

Run zeph init to configure either network interactively through the setup wizard.

Messenger as Agent Infrastructure

Most agents treat messaging apps as a thin input channel — user sends text, agent replies. Zeph's Telegram integration flips that model: the messenger becomes a coordination layer where agents serve public audiences, accept tasks from orchestrators, and talk to other bots.

Guest Mode removes the assumption that every user is a registered Telegram account. A transparent local proxy intercepts guest queries from the Bot API 10.0 and routes them to the agent without opening a second getUpdates connection (no 409 conflicts). The agent responds via answerGuestQuery — one call, no extra infra. This makes it practical to deploy public-facing agents that handle anonymous or unauthenticated requests.

Bot-to-Bot communication lets Zeph register as a managed bot via setManagedBotAccessSettings and accept tasks from other bots in a controlled chain. Consecutive bot replies are tracked per-chat, depth is capped at max_bot_chain_depth, and each inbound bot is validated against an allowlist — so the agent participates in multi-agent pipelines without becoming a relay for arbitrary bots.

Voice input via Cocoon STT. The Telegram adapter detects voice and audio messages, downloads the file, and passes it to the configured speech-to-text provider. With type = "cocoon" and stt_model set, transcription runs inside a hardware TEE — audio bytes never leave the isolated enclave unencrypted. This makes voice-driven agentic workflows practical for sensitive use cases: a voice note becomes a task, without the audio touching a third-party transcription API.

Configurable streaming interval (stream_interval_ms, default 3 s, minimum 500 ms) fixes a silent data-loss bug in the original hardcoded delay: responses that completed within a single interval window were discarded before Telegram saw them. Now the agent flushes on completion regardless of the timer.

[telegram]
guest_mode          = true
bot_to_bot          = true
allowed_bots        = ["orchestrator_bot", "scheduler_bot"]
max_bot_chain_depth = 3
stream_interval_ms  = 1500

[[llm.providers]]
name      = "stt"
type      = "cocoon"
stt_model = "whisper-large-v3"   # transcribes Telegram voice messages inside TEE

See the Telegram guide for full configuration and Bot API 10.0 details.

What Makes It Different

Memory is the product

Zeph combines several memory layers instead of treating recall as a side feature:

Layer Purpose
Working context Keeps the current task coherent under context pressure. See context budgets.
Semantic memory Stores conversations, tool outputs, documents, and summaries for retrieval. See semantic memory guide.
Graph memory Records entities, decisions, relationships, causality, temporal links, and hierarchy. See graph memory.
Episodic memory Preserves session-level scenes, digests, goals, and trajectories.
Quality gates Reject noisy writes, validate compaction, and log retrieval failures for later improvement. See quality self-check.

Ask "Why did we choose PostgreSQL?" and Zeph can traverse decision edges instead of searching raw chat text.

Built for low-resource setups

Zeph does not require a heavyweight stack to be useful:

  • The default vector backend is embedded SQLite.
  • Qdrant is optional for larger semantic and graph workloads.
  • The default local chat model is qwen3:8b through Ollama.
  • Feature bundles let you build only what you need: desktop, ide, server, chat, ml, or full.
  • Release builds are optimized for small native binaries.

Multi-model by design

Declare providers once in [[llm.providers]], then route work by complexity, cost, latency, and reliability:

[[llm.providers]]
name = "fast"
type = "ollama"
model = "qwen3:8b"
embedding_model = "qwen3-embedding"
embed = true

[[llm.providers]]
name = "quality"
type = "claude"
model = "claude-sonnet-4-6"
default = true

[llm]
routing = "bandit"

Use local models for extraction, embeddings, routing, and summarization. Keep expensive models for planning, code generation, and expert reasoning.

Tools without loose secrets

Secrets live in the Zeph age vault, not in .env files or shell profiles. Tool execution goes through trust gates, command filters, sandboxing, audit logs, and redaction paths. MCP tools are discovered and exposed without dropping the injection and authorization checks.

Demo

Zeph TUI Dashboard

Common Commands

zeph init                    # generate config through the wizard
zeph doctor                  # run preflight checks
zeph --tui                   # launch the dashboard
zeph ingest ./docs           # ingest documents into semantic memory
zeph skill list              # inspect installed skills
zeph plugin list --overlay   # inspect plugin config overlays
zeph router stats            # inspect adaptive provider routing
zeph memory export dump.json # export memory snapshot
zeph project purge --dry-run # preview local state cleanup

Installation Options

Pre-built Binary

curl -fsSL https://github.com/bug-ops/zeph/releases/latest/download/install.sh | sh

Cargo

cargo install zeph
cargo install zeph --features desktop

Docker

docker pull ghcr.io/bug-ops/zeph:latest

From Source

git clone https://github.com/bug-ops/zeph.git
cd zeph
cargo build --release --features full
./target/release/zeph init

Feature Highlights

Area Highlights
Memory SQLite/PostgreSQL history, embedded SQLite vectors or Qdrant, graph memory, SYNAPSE, SleepGate, APEX-MEM write-quality gates, BeliefMem probabilistic edge layer, MemCoT Zoom-In/Out recall views, document RAG.
Context Goal-aware compaction, TypedPage assembler pipeline, TACO output compression, tool-output archive, session recap, active-goal injection.
Skills SKILL.md registry, hot reload, BM25 + embedding matching, trust levels, self-learning skill improvement.
Providers Ollama, Claude, OpenAI, Gemini, OpenAI-compatible APIs, Gonka native inference, Cocoon decentralized TEE inference, Candle local inference, adaptive routing.
Tools Shell, file, web, MCP, tool quotas, approval gates, audit trail, sandboxing, output compression, speculative dispatch, ShadowSentinel safety probes, TrajectorySentinel capability governance.
Interfaces CLI, TUI, Telegram (with Guest Mode and Bot-to-Bot), Discord, Slack, ACP, A2A, HTTP gateway, scheduler daemon.
Code intelligence Tree-sitter indexing, semantic repo map, LSP diagnostics and hover context through MCP.
Observability Debug dumps, JSONL mode, Prometheus metrics, OpenTelemetry traces, profiling builds.

Architecture

See the architecture overview and crates reference for full details.

zeph
  src/                    CLI, bootstrap, init wizard, command handlers
  crates/zeph-core        agent loop and runtime orchestration
  crates/zeph-config      TOML schema, migration, provider registry
  crates/zeph-llm         provider abstraction and model backends
  crates/zeph-memory      semantic, graph, episodic, and document memory
  crates/zeph-skills      skill registry, matching, trust, learning
  crates/zeph-tools       tool executors, sandboxing, policy, audit
  crates/zeph-mcp         MCP client and tool lifecycle
  crates/zeph-tui         ratatui dashboard
  crates/zeph-acp         IDE integration through Agent Client Protocol
  crates/zeph-a2a         agent-to-agent protocol support
  crates/zeph-subagent    sub-agent definitions, spawning, transcripts
  crates/zeph-orchestration DAG planning, scheduling, verification

Documentation

Zeph draws from published work on parallel tool execution, temporal knowledge graphs, agentic memory linking, failure-driven compression, retrieval quality, and multi-model routing. See References & Inspirations for the full list.

Contributing

See CONTRIBUTING.md, CODE_OF_CONDUCT.md, and SECURITY.md.

License

MIT

About

Memory-first Rust AI agent for long-running work. Temporal graph memory, self-learning skills, multi-model cascade routing. Hybrid inference: Ollama · Claude · Gemini · OpenAI · GGUF · TEE. MCP + ACP + A2A. Sub-agents. One binary.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages