Skip to content

[DRAFT] FEAT add Agent Threat Rules (ATR) adversarial payload dataset loader#1715

Draft
eeee2345 wants to merge 2 commits into
microsoft:mainfrom
eeee2345:feat/atr-dataset-loader
Draft

[DRAFT] FEAT add Agent Threat Rules (ATR) adversarial payload dataset loader#1715
eeee2345 wants to merge 2 commits into
microsoft:mainfrom
eeee2345:feat/atr-dataset-loader

Conversation

@eeee2345
Copy link
Copy Markdown

Description

Draft PR implementing the dataset loader proposed in #1702, per @romanlutz's directional guidance (GitHub-hosted source, taxonomy as-is, scorer kept separate for a follow-up). Opening as draft so feedback can happen on actual code rather than spec — happy to iterate or close if direction shifts.

Note on file naming: the 5/10 issue comment cited pyrit/datasets/atr_threat.py and pyrit/datasets/README.md. After reading the codebase, the actual conventions are pyrit/datasets/seed_datasets/remote/<dataset>_dataset.py (matching _HarmBenchDataset, _PromptIntelDataset, etc.) and the dataset listing lives in doc/code/datasets/0_dataset.md (no pyrit/datasets/README.md exists). I followed those instead — easy to rename if a different layout is preferred.

What this PR adds

A new remote dataset loader at pyrit/datasets/seed_datasets/remote/agent_threat_rules_dataset.py that surfaces the Agent Threat Rules (ATR) autoresearch adversarial-payload corpus as a PyRIT SeedDataset.

ATR is an open MIT-licensed detection standard for AI agent threats. The autoresearch corpus (data/autoresearch/adversarial-samples.json) contains 1,054 attack-payload entries across ten base rule scenarios in six of the ten ATR categories (prompt-injection, tool-poisoning, context-exfiltration, agent-manipulation, privilege-escalation, skill-compromise). Each payload carries an attack technique label (paraphrase, language_switch, encoding, role_play, and 17 others) and the agent surface it targets (user_input, content, tool_args, tool_name, tool_response, agent_output).

Reference: https://github.com/Agent-Threat-Rule/agent-threat-rules
License: MIT

Files touched

  • pyrit/datasets/seed_datasets/remote/agent_threat_rules_dataset.py (new, 294 lines) — the loader, three companion enums (ATRCategory, ATRDetectionField, ATRVariationType), and a _RULE_ID_TO_CATEGORY dict that resolves each rule_id to its ATR taxonomy category
  • pyrit/datasets/seed_datasets/remote/__init__.py — adds the import to trigger auto-registration via SeedDatasetProvider.__init_subclass__, plus four entries in __all__
  • tests/unit/datasets/test_agent_threat_rules_dataset.py (new, 208 lines) — 13 unit tests covering happy path, missing-key validation, unknown rule_id skip path, all four filter axes, and enum-validation errors
  • doc/code/datasets/0_dataset.md — one-line addition to the "Examples of built-in datasets" list

No PyRIT core code is modified. No new dependencies are added.

Implementation notes

  • Source URL is pinned to the ATR commit db793f9 (current main HEAD when this PR was authored). This mirrors HarmBench's pinning convention (c0423b9) for reproducibility; pass the raw URL on main or a different tag to track upstream.
  • Each row of adversarial-samples.json maps to one SeedPrompt. The payload becomes value. The four upstream metadata fields (original_rule_id, technique, detection_field, variation_type) plus the upstream entry id are preserved on SeedPrompt.metadata. harm_categories is set to a single-element list with the ATR taxonomy category resolved via the loader's _RULE_ID_TO_CATEGORY dict.
  • Optional categories, techniques, detection_fields, and variation_types arguments narrow the dataset client-side after fetch. Enum arguments are validated against their expected types via the inherited _validate_enums helper, matching the pattern in _PromptIntelDataset.
  • Entries whose original_rule_id is not in the loader's category mapping are skipped (not errored) with an aggregate warning. This handles upstream rule additions that land before the loader's mapping is extended — users get a working dataset minus the unmapped rules, not a runtime failure.
  • The loader extends _RemoteDatasetLoader, so caching, file-type inference, and the public_url/file switch are all inherited — no duplicated infrastructure.

What this PR does NOT include (per #1702 discussion)

  • No scorer. Per @romanlutz's guidance to keep that separate, the ATR taxonomy scorer is a follow-up after this loader lands.
  • No HuggingFace mirror. Source is GitHub-hosted per the initial direction; a HuggingFace sibling release is straightforward to add later if users want it.
  • No taxonomy mapping into other PyRIT category schemas. ATR's taxonomy is preserved on harm_categories as-is per the same guidance.

Optional context for PyRIT users

ATR was recently integrated into MISP at two layers (merged 2026-05-10 by Alexandre Dulaunoy, MISP project lead):

Mentioning since PyRIT users routing red-team output into MISP-compatible threat-intel or CSIRT pipelines could benefit from the original_rule_id metadata on each SeedPrompt resolving natively as MISP machine tags downstream — no translation layer needed. Not required for the loader itself; just a downstream interop note.

Tests and Documentation

13 new unit tests in tests/unit/datasets/test_agent_threat_rules_dataset.py:

$ python -m pytest tests/unit/datasets/test_agent_threat_rules_dataset.py -v

test_dataset_name PASSED
test_fetch_dataset_returns_seed_dataset PASSED
test_seed_prompt_fields_populated PASSED
test_fetch_dataset_missing_keys_raises PASSED
test_unknown_rule_id_is_skipped_with_warning PASSED
test_filter_by_categories PASSED
test_filter_by_techniques PASSED
test_filter_by_detection_fields PASSED
test_filter_by_variation_types PASSED
test_combined_filters PASSED
test_invalid_category_raises PASSED
test_invalid_detection_field_raises PASSED
test_invalid_variation_type_raises PASSED

13 passed in 15.35s

ruff check and ruff format --check both pass on the new files and the modified __init__.py.

A real-network fetch against the pinned upstream URL was verified locally: 1,054 seeds load with the expected category distribution (prompt-injection 496, context-exfiltration 186, skill-compromise 124, tool-poisoning 93, agent-manipulation 93, privilege-escalation 62).

The new loader will be picked up automatically by tests/end_to_end/test_all_datasets.py via SeedDatasetProvider.get_all_providers() discovery — no parametrization update needed there.

JupyText was not run because this PR does not touch any notebooks or doc/code/ .py files (only the markdown 0_dataset.md).

Adds a new remote dataset loader at pyrit/datasets/seed_datasets/remote/
agent_threat_rules_dataset.py that surfaces the ATR autoresearch corpus
(1,054 attack-payload entries across six ATR taxonomy categories) as a
PyRIT SeedDataset.

Implements proposal in microsoft#1702, per directional guidance in that issue:
- Source pinned to GitHub (not HuggingFace) for the initial cut
- ATR taxonomy preserved as-is on harm_categories
- Scorer kept separate as a follow-up after this loader lands
- No PyRIT core code modified

Adds 13 unit tests covering happy path, missing-key validation, the
unknown-rule_id skip path, all four filter axes (categories, techniques,
detection_fields, variation_types), and enum-validation errors.

Updates pyrit/datasets/seed_datasets/remote/__init__.py to register the
loader via SeedDatasetProvider.__init_subclass__, and adds a one-line
entry to doc/code/datasets/0_dataset.md.

ruff check + ruff format both clean. Real-network fetch verified locally
against the pinned upstream URL.
Comment thread doc/code/datasets/0_dataset.md Outdated
- `harmbench`: Standard harmful behavior benchmarks
- `dark_bench`: Dark pattern detection examples
- `airt_*`: Various harm categories from AI Red Team
- `agent_threat_rules`: Agent Threat Rules (ATR) adversarial payloads for prompt injection, tool poisoning, and other AI-agent attack classes
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you rerun the 1_loading_datasets notebook it will update the list there, too. This is just a small subset. I have a pr out for doing that in fact #1707

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it — dropped the 0_dataset.md line in 44dce8b. Once #1707 lands and the notebook is re-executed against main, agent_threat_rules will show up in the canonical list automatically via SeedDatasetProvider auto-registration. Thanks for the pointer.

@romanlutz pointed out the manual entry in 0_dataset.md is a small
hardcoded subset; the canonical list is generated by re-executing
1_loading_datasets.ipynb (which his microsoft#1707 handles). Dropping the
manual line; auto-registration via SeedDatasetProvider already
ensures agent_threat_rules appears in the regenerated notebook
output once microsoft#1707 lands.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants