security: add Slack, JWT, Azure, and Discord patterns to sanitize.js#414
Open
shaun0927 wants to merge 2 commits intoEvoMap:mainfrom
Open
security: add Slack, JWT, Azure, and Discord patterns to sanitize.js#414shaun0927 wants to merge 2 commits intoEvoMap:mainfrom
shaun0927 wants to merge 2 commits intoEvoMap:mainfrom
Conversation
…voMap#409) sanitize.js::redactString is the last line of defense before session log excerpts and event reason strings are POSTed to the auto-issue tracker by issueReporter.js. It already covers OpenAI sk-, GitHub gh[pousr]_, AWS AKIA, npm_, generic Bearer, and a few others. It was missing four common formats: xoxb-... / xoxp-... Slack bot and user tokens eyJ....eyJ....{sig} JSON Web Tokens (header.payload.signature) AccountKey=...; Azure storage connection strings (key field) 3-segment base64url Discord bot tokens These formats show up routinely in agent runtime logs: Slack webhooks, OIDC id_tokens exchanged by SDKs, Azure Storage SDK emitted logs, and bot frameworks. Without these patterns any of them landing in a gene's outcome.reason or session log reaches GitHub in cleartext. The REDACT_PATTERNS array gets four new regexes, and LEAK_SCANNERS gets four matching entries so the non-destructive scanner reports the same categories. Testing: $ node test/sanitize.test.js All sanitize tests passed (34 assertions) $ node -e "const {redactString} = require('./src/gep/sanitize'); [...4 samples above...].forEach(([n, s]) => console.log(n, '->', redactString(s) === s ? 'LEAKED' : 'REDACTED'));" slack-bot -> REDACTED jwt -> REDACTED azure -> REDACTED discord -> REDACTED Note: EvoMap#107 (EvoMap#107) attempted a similar hardening and was closed without merge. Scope here is minimal (four additive regexes, no refactor, no behaviour change outside the matched substrings) to keep the review surface tight. Closes EvoMap#409
Follow-up on the initial patch in this PR. Two issues called out in
review:
1. REDACT_PATTERNS Discord regex was too broad:
/\b[A-Za-z0-9_-]{24,}\.[A-Za-z0-9_-]{6,}\.[A-Za-z0-9_-]{27,}\b/
This would also match lowercase dotted identifiers such as Python
module paths (some_really_long_module_name.submod.
another_really_long_module_name_here) and similarly structured
hostnames. Because REDACT_PATTERNS performs a destructive
replacement, false matches silently corrupt the sanitized output.
2. LEAK_SCANNERS discord_token still used the old narrow [MN] prefix
and therefore did not detect the O-prefix tokens that the
destructive pattern was intended to catch. Destructive and
non-destructive paths reported inconsistent results.
Fix for both paths:
/\b[MNO][A-Za-z0-9_-]{23,}\.[A-Za-z0-9_-]{6}\.[A-Za-z0-9_-]{27,}\b/
- Leading [MNO] requires a Discord-style uppercase prefix, ruling
out lowercase dotted identifiers.
- Middle segment pinned to exactly 6 chars (matches the base64url
encoding of the 4-byte token timestamp Discord uses).
- Third segment unchanged (27+ chars HMAC signature).
Verified:
discord-bot (should REDACT) -> redacted
lowercase module path (should NOT) -> unchanged
lowercase hostname (should NOT) -> unchanged
slack / jwt / azure -> still redacted
node test/sanitize.test.js
All sanitize tests passed (34 assertions)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
src/gep/sanitize.js::redactStringis the last line of defensebefore session log excerpts (up to 2,000 chars) and per-event
reasonstrings are posted toconfig.repobysrc/gep/issueReporter.js. It already covers OpenAIsk-, GitHubgh[pousr]_, AWSAKIA…, npmnpm_, genericBearer, and a fewothers. It was missing four common formats.
See #409 for reproduction and the full rationale.
xoxb-…,xoxp-…eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiIxIn0.sflKxw…DefaultEndpointsProtocol=https;AccountName=…;AccountKey=…ODY4…YPc-6Q…These formats show up routinely in agent runtime logs (Slack
webhooks, OIDC
id_tokenresponses, Azure Storage SDK debug logs,bot frameworks), so without these patterns any of them landing in a
gene's
outcome.reasonor in a session log reaches GitHub incleartext.
Fix
REDACT_PATTERNS(destructive replace).LEAK_SCANNERS(non-destructivedetection) so the existing scanner reports the same categories.
Scope:
src/gep/sanitize.js, +12 lines, no behaviour change outsidethe matched substrings.
Testing
Reproduction (uses 4 synthetic secrets in the public test):
Before this PR:
After this PR:
Prior art
PR #107 (
fix: harden sanitize patterns for token leakage prevention) attempted a similar hardening and was closed withoutmerge on 2026-02-26. This PR keeps the scope minimal — four additive
regexes, no refactor, no reorder of existing patterns — so reviewers
can approve or deny each pattern independently. Happy to split
further if that helps.
Closes #409.