[DRAFT] FEAT add Agent Threat Rules (ATR) adversarial payload dataset loader by eeee2345 · Pull Request #1715 · microsoft/PyRIT

eeee2345 · 2026-05-11T00:50:34Z

Description

Draft PR implementing the dataset loader proposed in #1702, per @romanlutz's directional guidance (GitHub-hosted source, taxonomy as-is, scorer kept separate for a follow-up). Opening as draft so feedback can happen on actual code rather than spec — happy to iterate or close if direction shifts.

Note on file naming: the 5/10 issue comment cited pyrit/datasets/atr_threat.py and pyrit/datasets/README.md. After reading the codebase, the actual conventions are pyrit/datasets/seed_datasets/remote/<dataset>_dataset.py (matching _HarmBenchDataset, _PromptIntelDataset, etc.) and the dataset listing lives in doc/code/datasets/0_dataset.md (no pyrit/datasets/README.md exists). I followed those instead — easy to rename if a different layout is preferred.

What this PR adds

A new remote dataset loader at pyrit/datasets/seed_datasets/remote/agent_threat_rules_dataset.py that surfaces the Agent Threat Rules (ATR) autoresearch adversarial-payload corpus as a PyRIT SeedDataset.

ATR is an open MIT-licensed detection standard for AI agent threats. The autoresearch corpus (data/autoresearch/adversarial-samples.json) contains 1,054 attack-payload entries across ten base rule scenarios in six of the ten ATR categories (prompt-injection, tool-poisoning, context-exfiltration, agent-manipulation, privilege-escalation, skill-compromise). Each payload carries an attack technique label (paraphrase, language_switch, encoding, role_play, and 17 others) and the agent surface it targets (user_input, content, tool_args, tool_name, tool_response, agent_output).

Reference: https://github.com/Agent-Threat-Rule/agent-threat-rules
License: MIT

Files touched

pyrit/datasets/seed_datasets/remote/agent_threat_rules_dataset.py (new, 294 lines) — the loader, three companion enums (ATRCategory, ATRDetectionField, ATRVariationType), and a _RULE_ID_TO_CATEGORY dict that resolves each rule_id to its ATR taxonomy category
pyrit/datasets/seed_datasets/remote/__init__.py — adds the import to trigger auto-registration via SeedDatasetProvider.__init_subclass__, plus four entries in __all__
tests/unit/datasets/test_agent_threat_rules_dataset.py (new, 208 lines) — 13 unit tests covering happy path, missing-key validation, unknown rule_id skip path, all four filter axes, and enum-validation errors
doc/code/datasets/0_dataset.md — one-line addition to the "Examples of built-in datasets" list

No PyRIT core code is modified. No new dependencies are added.

Implementation notes

Source URL is pinned to the ATR commit db793f9 (current main HEAD when this PR was authored). This mirrors HarmBench's pinning convention (c0423b9) for reproducibility; pass the raw URL on main or a different tag to track upstream.
Each row of adversarial-samples.json maps to one SeedPrompt. The payload becomes value. The four upstream metadata fields (original_rule_id, technique, detection_field, variation_type) plus the upstream entry id are preserved on SeedPrompt.metadata. harm_categories is set to a single-element list with the ATR taxonomy category resolved via the loader's _RULE_ID_TO_CATEGORY dict.
Optional categories, techniques, detection_fields, and variation_types arguments narrow the dataset client-side after fetch. Enum arguments are validated against their expected types via the inherited _validate_enums helper, matching the pattern in _PromptIntelDataset.
Entries whose original_rule_id is not in the loader's category mapping are skipped (not errored) with an aggregate warning. This handles upstream rule additions that land before the loader's mapping is extended — users get a working dataset minus the unmapped rules, not a runtime failure.
The loader extends _RemoteDatasetLoader, so caching, file-type inference, and the public_url/file switch are all inherited — no duplicated infrastructure.

What this PR does NOT include (per #1702 discussion)

No scorer. Per @romanlutz's guidance to keep that separate, the ATR taxonomy scorer is a follow-up after this loader lands.
No HuggingFace mirror. Source is GitHub-hosted per the initial direction; a HuggingFace sibling release is straightforward to add later if users want it.
No taxonomy mapping into other PyRIT category schemas. ATR's taxonomy is preserved on harm_categories as-is per the same guidance.

Optional context for PyRIT users

ATR was recently integrated into MISP at two layers (merged 2026-05-10 by Alexandre Dulaunoy, MISP project lead):

Add agent-threat-rules taxonomy MISP/misp-taxonomies#323 — 10 predicate categories and 330 rule IDs as machine tags
Add Agent Threat Rules galaxy + cluster (336 rules) MISP/misp-galaxy#1207 — 336 cluster values, each carrying kill-chain category, severity, and cve / owasp_llm / mitre_atlas cross-references per cluster

Mentioning since PyRIT users routing red-team output into MISP-compatible threat-intel or CSIRT pipelines could benefit from the original_rule_id metadata on each SeedPrompt resolving natively as MISP machine tags downstream — no translation layer needed. Not required for the loader itself; just a downstream interop note.

Tests and Documentation

13 new unit tests in tests/unit/datasets/test_agent_threat_rules_dataset.py:

$ python -m pytest tests/unit/datasets/test_agent_threat_rules_dataset.py -v

test_dataset_name PASSED
test_fetch_dataset_returns_seed_dataset PASSED
test_seed_prompt_fields_populated PASSED
test_fetch_dataset_missing_keys_raises PASSED
test_unknown_rule_id_is_skipped_with_warning PASSED
test_filter_by_categories PASSED
test_filter_by_techniques PASSED
test_filter_by_detection_fields PASSED
test_filter_by_variation_types PASSED
test_combined_filters PASSED
test_invalid_category_raises PASSED
test_invalid_detection_field_raises PASSED
test_invalid_variation_type_raises PASSED

13 passed in 15.35s

ruff check and ruff format --check both pass on the new files and the modified __init__.py.

A real-network fetch against the pinned upstream URL was verified locally: 1,054 seeds load with the expected category distribution (prompt-injection 496, context-exfiltration 186, skill-compromise 124, tool-poisoning 93, agent-manipulation 93, privilege-escalation 62).

The new loader will be picked up automatically by tests/end_to_end/test_all_datasets.py via SeedDatasetProvider.get_all_providers() discovery — no parametrization update needed there.

JupyText was not run because this PR does not touch any notebooks or doc/code/ .py files (only the markdown 0_dataset.md).

Adds a new remote dataset loader at pyrit/datasets/seed_datasets/remote/ agent_threat_rules_dataset.py that surfaces the ATR autoresearch corpus (1,054 attack-payload entries across six ATR taxonomy categories) as a PyRIT SeedDataset. Implements proposal in microsoft#1702, per directional guidance in that issue: - Source pinned to GitHub (not HuggingFace) for the initial cut - ATR taxonomy preserved as-is on harm_categories - Scorer kept separate as a follow-up after this loader lands - No PyRIT core code modified Adds 13 unit tests covering happy path, missing-key validation, the unknown-rule_id skip path, all four filter axes (categories, techniques, detection_fields, variation_types), and enum-validation errors. Updates pyrit/datasets/seed_datasets/remote/__init__.py to register the loader via SeedDatasetProvider.__init_subclass__, and adds a one-line entry to doc/code/datasets/0_dataset.md. ruff check + ruff format both clean. Real-network fetch verified locally against the pinned upstream URL.

romanlutz · 2026-05-11T00:52:25Z

 - `harmbench`: Standard harmful behavior benchmarks
 - `dark_bench`: Dark pattern detection examples
 - `airt_*`: Various harm categories from AI Red Team
+- `agent_threat_rules`: Agent Threat Rules (ATR) adversarial payloads for prompt injection, tool poisoning, and other AI-agent attack classes


if you rerun the 1_loading_datasets notebook it will update the list there, too. This is just a small subset. I have a pr out for doing that in fact #1707

Got it — dropped the 0_dataset.md line in 44dce8b. Once #1707 lands and the notebook is re-executed against main, agent_threat_rules will show up in the canonical list automatically via SeedDatasetProvider auto-registration. Thanks for the pointer.

@romanlutz

@romanlutz pointed out the manual entry in 0_dataset.md is a small hardcoded subset; the canonical list is generated by re-executing 1_loading_datasets.ipynb (which his microsoft#1707 handles). Dropping the manual line; auto-registration via SeedDatasetProvider already ensures agent_threat_rules appears in the regenerated notebook output once microsoft#1707 lands.

eeee2345 mentioned this pull request May 11, 2026

Proposal: Add Agent Threat Rules (ATR) dataset loader and taxonomy scorer #1702

Open

romanlutz reviewed May 11, 2026

View reviewed changes

eeee2345 mentioned this pull request May 11, 2026

Companion package proposal: counterfit-detection-atr plugin (out-of-tree, no core changes) Azure/counterfit#96

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT] FEAT add Agent Threat Rules (ATR) adversarial payload dataset loader#1715

[DRAFT] FEAT add Agent Threat Rules (ATR) adversarial payload dataset loader#1715
eeee2345 wants to merge 2 commits into
microsoft:mainfrom
eeee2345:feat/atr-dataset-loader

eeee2345 commented May 11, 2026

Uh oh!

romanlutz May 11, 2026

Uh oh!

eeee2345 May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

eeee2345 commented May 11, 2026

Description

What this PR adds

Files touched

Implementation notes

What this PR does NOT include (per #1702 discussion)

Optional context for PyRIT users

Tests and Documentation

Uh oh!

romanlutz May 11, 2026

Choose a reason for hiding this comment

Uh oh!

eeee2345 May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants