Skip to content

feat(evaluators): add ATR threat detection evaluator#170

Open
eeee2345 wants to merge 2 commits intoagentcontrol:mainfrom
eeee2345:feat/atr-evaluator
Open

feat(evaluators): add ATR threat detection evaluator#170
eeee2345 wants to merge 2 commits intoagentcontrol:mainfrom
eeee2345:feat/atr-evaluator

Conversation

@eeee2345
Copy link
Copy Markdown

@eeee2345 eeee2345 commented Apr 9, 2026

Summary

Add a contrib evaluator using ATR (Agent Threat Rules) for regex-based AI agent threat detection. No API keys required.

Resolves #169

Evaluator details

  • Name: atr.threat_rules
  • Auto-discovery: via agent_control.evaluators entry point
  • Config: ATRConfig with min_severity, categories, block_on_match, on_error
  • Rules: 20 ATR rules, 306 compiled patterns across 8 categories
  • Performance: <5ms evaluation, pure regex

Config options

ATRConfig(
    min_severity="medium",       # Filter rules below this severity
    categories=["prompt-injection"],  # Only load specific categories (empty = all)
    block_on_match=True,         # Set matched=True on detection
    on_error="allow",            # "allow" (fail-open) or "deny" (fail-closed)
)

Key design decisions

  • Multi-match: Returns ALL matching rules, not just first match. Metadata includes findings array + backward-compatible single-match fields.
  • _coerce_to_string: Scans all priority dict fields (content, input, output, text, message), not just the first.
  • Error handling: Respects on_error policy — fail-open returns error field, fail-closed returns matched=True.
  • Follows Cisco evaluator pattern: Same directory structure, pyproject.toml, Makefile, entry points.

Files

evaluators/contrib/atr/
  pyproject.toml              # hatchling build, entry point
  Makefile                    # test/lint/typecheck/build
  README.md                   # Usage docs
  src/.../threat_rules/
    config.py                 # ATRConfig
    evaluator.py              # ATREvaluator (222 lines)
    rules.json                # 20 rules, 306 patterns
  tests/
    test_evaluator.py         # 22 tests

Test plan

  • Rule loading and compilation
  • Known-bad inputs: prompt injection, jailbreak, reverse shell, credential exposure, system prompt override
  • Known-good inputs: normal text, code, URLs
  • Config: min_severity filtering, category filtering, block_on_match=False
  • Error handling: None/empty input, on_error deny/allow
  • Multi-match: content triggering multiple categories returns all findings
  • Dict input: scans all priority fields
  • make test / make lint / make typecheck

Panguard AI added 2 commits April 9, 2026 22:57
Add contrib evaluator using ATR (Agent Threat Rules) community rules:
- 20 regex rules covering OWASP Agentic Top 10
- Configurable severity threshold and category filtering
- Pure regex, no API keys, <5ms evaluation time
- Auto-discovered via entry points

Source: https://agentthreatrule.org (MIT)
- _match_rules now returns all matching rules, not just first match
- Add super().__init__(config) call
- _coerce_to_string scans all priority dict fields, not just first
- Add multi-match test and dict field scanning test
- Backward-compatible: single-match fields still in metadata
Copy link
Copy Markdown
Contributor

@lan17 lan17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing this. This is a great start, and I really like the idea of having a local/no-key ATR evaluator available as a contrib package.

I left a few comments below. The main things I think we need to tighten before merging are preserving ATR's field/context semantics, avoiding secret leakage through metadata, and putting a harder bound around regex evaluation latency.

for rule in self._compiled_rules:
for pattern_entry in rule["patterns"]:
regex: re.Pattern[str] = pattern_entry["regex"]
match = regex.search(text)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this loses an important part of the ATR model. The upstream rules are field/context scoped, but here every regex is run over one flattened text blob. For example, ATR-2026-00061 is intended for MCP tool args/tool responses, but this implementation will match something benign like Please rotate the database credentials tomorrow. because the word credentials appears anywhere in the selected data. We should preserve the ATR fields/scan target semantics, or at least make the broad free-text behavior explicit and opt-in.

"title": rule["title"],
"severity": rule["severity"],
"category": rule["category"],
"matched_text": match.group()[:200],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not include the literal matched secret in result metadata. For credential rules, matched_text can be the AWS key/token/etc. that we just detected, and the SDK observability path copies result metadata into events by default. I'd redact, hash, or omit the matched text by default and only expose a safe snippet/offset if we really need debugging detail.

for rule in self._compiled_rules:
for pattern_entry in rule["patterns"]:
regex: re.Pattern[str] = pattern_entry["regex"]
match = regex.search(text)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also block the event loop for longer than the evaluator timeout. evaluate() is async, but the work here is CPU-bound synchronous regex scanning, so asyncio.wait_for in the engine cannot interrupt it until this function yields. On my machine, benign no-match input took about 66ms for 10KB, 664ms for 100KB, and 6.6s for 1MB. We need a hard input-length cap and/or a different execution strategy before this is safe on latency-sensitive paths.

metadata={
"findings": all_findings,
"count": len(all_findings),
"max_severity": all_findings[0]["severity"] if all_findings else None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_severity is currently just the first finding in rules-file order, not the maximum severity. I hit a case where a high-severity rule appeared before a critical credential rule, and the metadata reported max_severity='high'. This should compute the max using the severity ordering, and the legacy single-match fields probably should point at the highest-severity finding too.

@echo " make lint-fix - run ruff check --fix"
@echo " make typecheck - run mypy"
@echo " make build - build package"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is adding a new supported contrib package, I think we also need to wire it into the repo-level checks and release path. Right now the root test-extras target only runs Galileo, and the release/build scripts only publish the Galileo contrib evaluator. If ATR should be maintained here, it should have root atr-* targets, be included in contrib CI coverage, and be added to build/release/version metadata.

@lan17
Copy link
Copy Markdown
Contributor

lan17 commented Apr 26, 2026

Stepping back, I think the package/entry-point/test skeleton is a good start, but I’d rework the core evaluator before iterating on smaller fixes.

The main architectural issue is that this currently behaves like a flattened regex scanner, while ATR is really an event/field-aware rule format. I’d preserve ATR fields, scan targets, and condition logic in typed rule models, adapt Agent Control selector output into an explicit ATR event, and only run each condition against its intended field. That should also make it easier to keep metadata safe, cap runtime, and position this as complementary to yelp.detect_secrets rather than overlapping with it.

So my recommendation is: keep the contrib evaluator idea, but redesign the evaluator core around ATR semantics instead of patching the current data -> string -> all regexes pipeline.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(evaluators): ATR regex-based threat detection evaluator

2 participants