security: add Slack, JWT, Azure, and Discord patterns to sanitize.js by shaun0927 · Pull Request #414 · EvoMap/evolver

shaun0927 · 2026-04-17T08:01:48Z

Problem

src/gep/sanitize.js::redactString is the last line of defense
before session log excerpts (up to 2,000 chars) and per-event
reason strings are posted to config.repo by
src/gep/issueReporter.js. It already covers OpenAI sk-, GitHub
gh[pousr]_, AWS AKIA…, npm npm_, generic Bearer, and a few
others. It was missing four common formats.

See #409 for reproduction and the full rationale.

Format	Example
Slack bot/user tokens	`xoxb-…`, `xoxp-…`
JWT (header.payload.signature)	`eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiIxIn0.sflKxw…`
Azure storage connection strings	`DefaultEndpointsProtocol=https;AccountName=…;AccountKey=…`
Discord bot tokens	`ODY4…YPc-6Q…`

These formats show up routinely in agent runtime logs (Slack
webhooks, OIDC id_token responses, Azure Storage SDK debug logs,
bot frameworks), so without these patterns any of them landing in a
gene's outcome.reason or in a session log reaches GitHub in
cleartext.

Fix

Add four regexes to REDACT_PATTERNS (destructive replace).
Add four matching entries to LEAK_SCANNERS (non-destructive
detection) so the existing scanner reports the same categories.

Scope: src/gep/sanitize.js, +12 lines, no behaviour change outside
the matched substrings.

Testing

$ node test/sanitize.test.js
All sanitize tests passed (34 assertions)

Reproduction (uses 4 synthetic secrets in the public test):

const { redactString } = require('./src/gep/sanitize');
const samples = [
  'xoxb-1234567890-1234567890123-AbCdEfGhIjKlMnOpQrStUvWx',
  'eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiIxIn0.sflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c',
  'DefaultEndpointsProtocol=https;AccountName=acct;AccountKey=ABC123DEF456ghi789JKL==;EndpointSuffix=core.windows.net',
  'ODY4MzE2NTg4ODIwMTEyMzQ1.YPc-6Q.K3n1tfY9q9f4k_5vZl3Mw2X1AbCdefGhIjK',
];
for (const s of samples) console.log(redactString(s) === s ? 'LEAKED' : 'REDACTED');

Before this PR:

LEAKED
LEAKED
LEAKED
LEAKED

After this PR:

REDACTED
REDACTED
REDACTED
REDACTED

Prior art

PR #107 (fix: harden sanitize patterns for token leakage prevention) attempted a similar hardening and was closed without
merge on 2026-02-26. This PR keeps the scope minimal — four additive
regexes, no refactor, no reorder of existing patterns — so reviewers
can approve or deny each pattern independently. Happy to split
further if that helps.

Closes #409.

…voMap#409) sanitize.js::redactString is the last line of defense before session log excerpts and event reason strings are POSTed to the auto-issue tracker by issueReporter.js. It already covers OpenAI sk-, GitHub gh[pousr]_, AWS AKIA, npm_, generic Bearer, and a few others. It was missing four common formats: xoxb-... / xoxp-... Slack bot and user tokens eyJ....eyJ....{sig} JSON Web Tokens (header.payload.signature) AccountKey=...; Azure storage connection strings (key field) 3-segment base64url Discord bot tokens These formats show up routinely in agent runtime logs: Slack webhooks, OIDC id_tokens exchanged by SDKs, Azure Storage SDK emitted logs, and bot frameworks. Without these patterns any of them landing in a gene's outcome.reason or session log reaches GitHub in cleartext. The REDACT_PATTERNS array gets four new regexes, and LEAK_SCANNERS gets four matching entries so the non-destructive scanner reports the same categories. Testing: $ node test/sanitize.test.js All sanitize tests passed (34 assertions) $ node -e "const {redactString} = require('./src/gep/sanitize'); [...4 samples above...].forEach(([n, s]) => console.log(n, '->', redactString(s) === s ? 'LEAKED' : 'REDACTED'));" slack-bot -> REDACTED jwt -> REDACTED azure -> REDACTED discord -> REDACTED Note: EvoMap#107 (EvoMap#107) attempted a similar hardening and was closed without merge. Scope here is minimal (four additive regexes, no refactor, no behaviour change outside the matched substrings) to keep the review surface tight. Closes EvoMap#409

Follow-up on the initial patch in this PR. Two issues called out in review: 1. REDACT_PATTERNS Discord regex was too broad: /\b[A-Za-z0-9_-]{24,}\.[A-Za-z0-9_-]{6,}\.[A-Za-z0-9_-]{27,}\b/ This would also match lowercase dotted identifiers such as Python module paths (some_really_long_module_name.submod. another_really_long_module_name_here) and similarly structured hostnames. Because REDACT_PATTERNS performs a destructive replacement, false matches silently corrupt the sanitized output. 2. LEAK_SCANNERS discord_token still used the old narrow [MN] prefix and therefore did not detect the O-prefix tokens that the destructive pattern was intended to catch. Destructive and non-destructive paths reported inconsistent results. Fix for both paths: /\b[MNO][A-Za-z0-9_-]{23,}\.[A-Za-z0-9_-]{6}\.[A-Za-z0-9_-]{27,}\b/ - Leading [MNO] requires a Discord-style uppercase prefix, ruling out lowercase dotted identifiers. - Middle segment pinned to exactly 6 chars (matches the base64url encoding of the 4-byte token timestamp Discord uses). - Third segment unchanged (27+ chars HMAC signature). Verified: discord-bot (should REDACT) -> redacted lowercase module path (should NOT) -> unchanged lowercase hostname (should NOT) -> unchanged slack / jwt / azure -> still redacted node test/sanitize.test.js All sanitize tests passed (34 assertions)

shaun0927 added 2 commits April 17, 2026 17:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

security: add Slack, JWT, Azure, and Discord patterns to sanitize.js#414

security: add Slack, JWT, Azure, and Discord patterns to sanitize.js#414
shaun0927 wants to merge 2 commits intoEvoMap:mainfrom
shaun0927:security/sanitize-additional-token-patterns

shaun0927 commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shaun0927 commented Apr 17, 2026

Problem

Fix

Testing

Prior art

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant