fix(scanner): wrap untrusted repo content in prompt isolation tags#226
fix(scanner): wrap untrusted repo content in prompt isolation tags#22621lakshh wants to merge 4 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces prompt isolation in src/scanner/enricher.py by wrapping untrusted repository content inside <untrusted_code> tags, and adds comprehensive unit tests in tests/unit/test_enricher.py to verify this behavior. The review feedback highlights a high-severity vulnerability where untrusted content containing the literal </untrusted_code> tag can escape the isolation block, and recommends sanitizing inputs to prevent tag escaping. Additionally, the reviewer suggests adding a test case to cover this specific tag-escaping injection scenario.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
|
| Filename | Overview |
|---|---|
| src/scanner/enricher.py | Adds _escape_untrusted(), _allowlist(), tag constants, and allowlists; rewrites both prompt templates to isolate all repo-controlled fields inside untrusted_code blocks with sandwich-pattern reinforce instructions; updates _enrich_one_symbol and _enrich_one_file call sites to apply escaping and allowlisting before format(). |
| tests/unit/test_enricher.py | New test file with 30+ tests covering prompt isolation, tag-escape prevention, allowlist filtering, injection containment in both symbol and file enrichment paths, truncation, empty-LLM-response early-return, neo4j failure isolation, and enrich_repo stats/cap behaviour. |
Sequence Diagram
sequenceDiagram
participant MongoDB as MongoDB (Phase 1 data)
participant Enricher as Enricher
participant Escape as _escape_untrusted()
participant Allowlist as _allowlist()
participant Prompt as Prompt Builder
participant LLM as LLM
MongoDB->>Enricher: raw_code, docstring, signature, qualified_name, file_path, symbol_list
MongoDB->>Enricher: language, symbol_type (enum fields)
Enricher->>Escape: repo-controlled strings
Escape-->>Enricher: neutralised (close/open tags escaped)
Enricher->>Allowlist: language / symbol_type
Allowlist-->>Enricher: allowlisted value or safe default
Enricher->>Prompt: escaped values inside untrusted block + allowlisted enums in trusted preamble
Note over Prompt: Trusted preamble (symbol_type, language)<br/>Pre-instruction: treat block as inert<br/>untrusted_code block: qualified_name, signature, docstring, raw_code<br/>Reinforce instruction after closing tag
Prompt->>LLM: fully constructed prompt
LLM-->>Enricher: summary string
Enricher->>MongoDB: update_symbol_summary / update_file_summary
Reviews (4): Last reviewed commit: "fix(scanner): escape opening tag to clos..." | Re-trigger Greptile
|
@ishaanxgupta looks good you can merge it now |
|
Hi @21lakshh please have a look on the greptile suggestions once |
|
@ishaanxgupta done, thanks!! |
Summary
Fixes indirect prompt injection vulnerabilities in repository enrichment prompts by isolating untrusted repository content inside
<untrusted_code>tags and reinforcing model instructions before generation.Motivation / Problem
Repository-controlled content such as
raw_code,docstring, andsymbol_listcould inject instructions into enrichment prompts and influence downstream LLM behavior during indexing.This change adds structural prompt isolation protections to prevent repository content from being interpreted as executable instructions.
Closes #224
Changes
Added
_escape_untrusted()helper to neutralize embedded</untrusted_code>tag escape attemptsWrapped all repo-controlled fields inside
<untrusted_code>isolation blocks:raw_codedocstringsignaturequalified_namesymbol_listfile_pathUpdated both
_SYMBOL_PROMPTand_FILE_PROMPTMoved scanner-controlled metadata (
language,symbol_type,symbol_count) into trusted prompt contextAdded explicit pre-instructions telling the model to treat tagged content as inert data
Added reinforce instructions after untrusted content using a sandwich-pattern defense
Added prompt isolation tests for:
Added integration-style coverage for enrichment write paths and failure handling
Preserved repository fidelity without regex stripping or code mutation
Testing
pytest tests/unit)pytest tests/integration)Additional verification
Verified injection payloads in
raw_codeanddocstringremain fully contained inside<untrusted_code>tagsVerified
_SYMBOL_PROMPTand_FILE_PROMPTboth include reinforce instructionsVerified:
max_symbolscap handlingclose()delegationScreenshots / recordings (if UI change)
N/A
Checklist
fix(security): harden enrichment prompts against indirect injection)ruff check .andblack --check .locally with no errorsCHANGELOG.mdif this is a user-visible changeuv lockif I modifiedpyproject.toml@ishaanxguptaor@ved015