Skip to content

release: v0.15.0 — BM25符号搜索 / AppContext拆分 / Embedding多后端 / Health扩展 / 不变量CI#8

Merged
juice094 merged 29 commits into
mainfrom
fix/project-health-cleanup
May 11, 2026
Merged

release: v0.15.0 — BM25符号搜索 / AppContext拆分 / Embedding多后端 / Health扩展 / 不变量CI#8
juice094 merged 29 commits into
mainfrom
fix/project-health-cleanup

Conversation

@juice094
Copy link
Copy Markdown
Owner

v0.15.0 完整交付,含文档一致化修复(commit 684e18a)。\n\n验证:\n- cargo check --workspace: 0 errors\n- cargo test --workspace: 503 passed / 0 failed / 4 ignored\n\n变更摘要:\n- Cargo.toml / workspace.package version bump 0.14.3 → 0.15.0\n- README badge + Roadmap 表时间线正序重排\n- AGENTS.md 版本字段同步\n- 新增 RELEASE_NOTES_v0.15.0.md\n- 归档过期状态报告到 docs/_audit/

juice094 added 29 commits May 6, 2026 20:58
…rmissions

softprops/action-gh-release@v1 consistently fails with 403 / Too many retries

on this repository. Switch to native gh release create CLI which

uses the same GITHUB_TOKEN but has better error handling.

Also add explicit permissions: contents: write.
…branches

Phase 1: AGENTS.md version hallucination fix

- v0.16.1 -> v0.14.3, commit hash 5928499 -> 2867811

- Test counts: 456 -> 490+, integration 9/11 -> 11/11

- Clippy: 1 warning -> 0 warning

Phase 2: Remove dead code KnowledgeRepository::generate_report

- Zero callers confirmed by grep across src/

- Actual devkit_knowledge_report uses oplog_analytics::generate_report

Phase 3: Remote feature branches already absent (cleaned previously)
feat(cli): add --json to scan/workflow list, knowledge-report command

CLI parity fixes (Agent-driven testing):
- scan: add --json flag (underlying run_json already existed)
- workflow list: add --json flag with {success, count, workflows[]} output
- knowledge-report: new CLI subcommand matching MCP devkit_knowledge_report

Health query performance optimization:
- Add EnvVersionCache (30s TTL) to AppContext; eliminates 5 sequential
  subprocess spawns on cached calls
- Parallel subprocess spawns via tokio::join! on cold cache
- Batch get_health_batch() query replacing N+1 individual SELECTs
- Dedup repo.head() call in analyze_repo -> calc_ahead_behind

Benchmarks (Windows):
- health --json median: 122ms -> 68ms (-44%)
- Agent loop median: 168ms -> 114ms (-32%)
- 408 tests pass, 0 regression
…le batch

- Add EmbeddingProvider::encode_batch() trait method for future GPU/ONNX providers
- Implement encode_batch_with_candle using true batch forward (pad + single pass)
- Revert generate_and_save_embeddings to rayon par_iter single encoding:
  Candle CPU BERT batch=32 forward takes ~1.7s vs ~10ms single; total 88s vs 16s
- Add --skip-embeddings CLI flag: index drops from ~16s to ~250ms
- Keep unified AST walk (extract_symbols_and_calls) as prior cleanup

BREAKING: run_index/run_index_with_progress signatures gain skip_embeddings bool
…pipeline reference

- docs/_audit/2026-04-26-embedding-research.md: add 2026-05-04 supplement
  with batch encoding failure data (88s vs 16s), --skip-embeddings path,
  and external distill-knowledge Skill pipeline reference
- AGENTS.md: cross-reference to audit doc for knowledge distillation spec
Based on architecture governance methodology from external research
(Kimi session e9f2965f-b949-46a5-9d7c-afd6d4d9232c):

- docs/architecture/adr-template.md: ADR template + completed ADR-001/002
- docs/architecture/invariants.md: global + tiered invariants + extraction drill checklist
- AGENTS.md: add Architecture Governance subsection linking to new docs
- docs/README.md: register new architecture docs in navigation
Replace 24 unwrap/expect occurrences across 7 production files:
- src/search.rs: 12× schema.get_field expect → ? (functions return TantivyError)
- src/workflow/scheduler.rs: 4× expect → ok_or_else + ? (anyhow::Result)
- src/search/hybrid.rs: 2× expect → Vec::remove(0) (len==1 verified)
- src/discovery_engine.rs: 2× expect → ok_or_else + ? (anyhow::Result)
- src/semantic_index/mod.rs: 2× expect/unwrap + signature changes
  - index_repo_full → anyhow::Result<(Vec<CodeSymbol>, Vec<CodeCall>)>
  - index_repo → anyhow::Result<Vec<CodeSymbol>>
  - callers: knowledge_engine/index.rs + semantic_index/mod.rs updated
- src/query.rs: 1× expect → ? (Option propagation)
- src/test_utils.rs: 1× expect removed, return type → anyhow::Result
- benches/semantic_index.rs: add .unwrap() to bench call (benchmark exempt)

Plan documented in plans/rf6-unwrap-audit-plan.md

All checks pass:
- cargo test --all-targets: 408 passed / 0 failed / 3 ignored
- cargo clippy --all-targets -- -D warnings: 0 warnings
- cargo fmt --check: 0 diff
- docs/README.md: version v0.13.0 -> v0.14.3, test count 389->408, add RF-6 and crates metrics
- AGENTS.md: update current phase description, version ref, completed milestones
- init_db() global paths: no external callers found, fully migrated to AppContext
- Feature flags: tui/watch/mcp/embedding all optional, --no-default-features compiles
…/go)

Add optional dependencies + feature flags for all 4 tree-sitter grammars:
- Cargo.toml: tree-sitter-{rust,python,typescript,go} -> optional
- New features: lang-rust, lang-python, lang-js-ts, lang-go
- Default features include all 4 for backward compatibility
- semantic_index/mod.rs: #[cfg] on Lang enum variants + from_ext/parser_language
- semantic_index/symbol.rs: #[cfg] on grammar match arms + test guard
- semantic_index/call_graph.rs: #[cfg] on lang match arms

This allows --no-default-features + selective languages to reduce
compile time by skipping unused grammar C compilation.

All checks pass:
- cargo test --all-targets: 408 passed / 0 failed / 3 ignored
- cargo clippy --all-targets -- -D warnings: 0 warnings
- cargo fmt --check: 0 diff
Feature-gated grammar crates: lang-rust/python/js-ts/go features added,
allowing selective compilation to reduce build time.
Extend repair_tantivy_consistency to detect the reverse gap:
SQLite entities missing from Tantivy index (silent search gap).

Changes:
- Add RepairResult { orphans, missing_from_index }
- Convert tantivy_ids to HashSet for O(1) lookup
- After orphan cleanup, iterate all SQLite repo IDs and warn
  for any not found in Tantivy index
- Update repair_tantivy_consistency_at signature
- Update test_repair_tantivy_consistency_detects_orphan

Does NOT trigger re-index at startup (too heavy); only logs
warnings for operator visibility.

All checks pass:
- cargo test --all-targets: 408 passed / 0 failed / 3 ignored
- cargo clippy --all-targets -- -D warnings: 0 warnings
- cargo fmt --check: 0 diff
Reverse consistency check landed in fe14c81.
Short-term detection gap closed; long-term transaction coordination still open.
When system clock drifts backward or checked_at is in the future,
signed_duration_since returns a negative value. Previously this
would incorrectly satisfy elapsed < ttl_seconds, causing the
cache to never refresh.

Change elapsed < ttl_seconds to elapsed >= 0 && elapsed < ttl_seconds
to force a refresh when elapsed is negative.
Replace SQLite LIKE fallback in keyword_search_symbols with Tantivy BM25
via a dedicated symbol_index.

- New src/search/symbol_index.rs: schema (repo_id, name, signature,
  file_path, line_start), add_symbols, search_symbols, delete_repo_symbols
- StorageBackend trait: add symbol_index_path() (default + Temp + 4 test impls)
- knowledge_engine/index.rs: index symbols into Tantivy after SQLite persist
- hybrid.rs: keyword_search_symbols now primary-path BM25, fallback to
  SQLite LIKE for repos without symbol index (backward compatible)
- search.rs: pub mod symbol_index

Tests: 410 passed / 0 failed / 4 ignored
- New tools/invariant-checks/run-checks.ps1:
  - G5 (RF-6): detect new unwrap/expect/panic in production code via git diff
  - T11: detect direct rusqlite::Connection usage in mcp/tools
  - T12: detect write operations in tui/render production code
  - Module extraction drill: verify README.md + Cargo.toml presence
  - Known exceptions for legacy T11 violations
- CI integration: add 'Architecture Invariants' job to ci.yml
- Fix crates/devbase-embedding/src/lib.rs: encode_with_candle
  .unwrap() -> .ok_or_else() (RF-6 compliance)
- Add plans/appcontext-refactor-design.md (P2 design doc)

Tests: 410 passed / 0 failed / 4 ignored
…age.rs

Split AppContext's 6 Client trait implementations from storage.rs
into their respective domain modules (zero behavior change):

- ScanClient    -> scan.rs
- HealthClient  -> health.rs
- SyncClient    -> sync.rs
- DigestClient  -> digest.rs
- KnowledgeClient -> knowledge_engine/mod.rs
- RegistryClient  -> registry.rs

Result: storage.rs reduced from ~860 lines to ~430 lines (-50%).

Tests: 410 passed / 0 failed / 4 ignored
- Move query_code_symbols SQL logic to registry::code_symbols with CodeSymbolRow
- Move query_dead_code SQL logic to registry::dead_code with DeadCodeRow
- Simplify RegistryClient impl to pure delegation + JSON wrapping
- Add unit tests for both new query functions using in-memory DB
- Preserve re-exports for backward compatibility with src/repository/symbol.rs

Zero behavioral change; SQL strings identical to pre-extraction.
- EnvVersionCache: add python, bun, zig, java fields
- refresh_env_cache: detect 9 tools in parallel (was 5)
- get_tool_version: fallback to stderr when stdout empty (Java)
- fmt_version: handle quoted versions (Java), Docker version, Python
- JSON/CLI output: all 9 tools displayed
- tests: 5 new fmt_version tests for new tools
- devbase-embedding: add OllamaProvider (HTTP /api/embed via ureq)
- devbase-embedding: add create_provider(backend, model, base_url, timeout)
- Config default: model 'nomic-embed-text' -> 'all-minilm' (384-dim, candle-compatible)
- Config docs: explain candle vs ollama backend choice
- Tests: 3 create_provider tests (candle, ollama, unknown fallback)
- src/embedding.rs: generate_query_embedding now reads EmbeddingConfig
  via OnceLock-cached provider (first call loads config, rest reuse)
- crates/devbase-embedding: remove generate_query_embedding (moved to
  main crate where Config is accessible)
- All 5 call sites (skill search, index, TUI search, MCP search x2)
  automatically use configured backend without code changes
- AGENTS.md: mark P1~P5 complete, update commit ref to e230b6b
- devbase-embedding/README.md: 50-word extraction drill doc
P1~P5 全部交付:
- BM25 符号搜索 (Tantivy)
- Embedding 多后端 (Candle + Ollama)
- AppContext 拆分 Phase 1/2
- Health 工具链扩展 (9 tools)
- 架构不变量 CI (G5/T11/T12)
- TTL 负值 bugfix
- Cargo.toml: 主 crate & workspace.package version 0.14.3 → 0.15.0
- README: version badge 0.14.3 → 0.15.0, tests badge 390 → 490+
- README: 路线图表按时间正序重排,修正 v0.15.0/v0.16.0/v0.14.3 状态
  - v0.15.0 描述对齐 CHANGELOG(P1-P5),标为 ✅ 当前
  - v0.16.0 描述对齐 docs/ROADMAP.md,标为 📋 进行中
  - v0.14.3 由 ✅ 当前 改为 ✅ 已发布
- AGENTS.md: 当前版本描述由 v0.15.0-in-progress 改为 v0.15.0 已发布
- 新增 RELEASE_NOTES_v0.15.0.md(弥补 v0.11.0~v0.14.x 缺失)
- 归档过期状态报告:PROJECT_STATUS_2026-04-29.md → docs/_audit/
  STAGE_REPORT_2026-04-09.md → docs/_audit/

验证: cargo check --workspace 0 errors; cargo test --workspace 503 passed / 0 failed / 4 ignored
@juice094 juice094 merged commit 0a6758d into main May 11, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant