[DRAFT] FEAT add Agent Threat Rules (ATR) adversarial payload dataset loader#1715
Draft
eeee2345 wants to merge 2 commits into
Draft
[DRAFT] FEAT add Agent Threat Rules (ATR) adversarial payload dataset loader#1715eeee2345 wants to merge 2 commits into
eeee2345 wants to merge 2 commits into
Conversation
Adds a new remote dataset loader at pyrit/datasets/seed_datasets/remote/ agent_threat_rules_dataset.py that surfaces the ATR autoresearch corpus (1,054 attack-payload entries across six ATR taxonomy categories) as a PyRIT SeedDataset. Implements proposal in microsoft#1702, per directional guidance in that issue: - Source pinned to GitHub (not HuggingFace) for the initial cut - ATR taxonomy preserved as-is on harm_categories - Scorer kept separate as a follow-up after this loader lands - No PyRIT core code modified Adds 13 unit tests covering happy path, missing-key validation, the unknown-rule_id skip path, all four filter axes (categories, techniques, detection_fields, variation_types), and enum-validation errors. Updates pyrit/datasets/seed_datasets/remote/__init__.py to register the loader via SeedDatasetProvider.__init_subclass__, and adds a one-line entry to doc/code/datasets/0_dataset.md. ruff check + ruff format both clean. Real-network fetch verified locally against the pinned upstream URL.
romanlutz
reviewed
May 11, 2026
| - `harmbench`: Standard harmful behavior benchmarks | ||
| - `dark_bench`: Dark pattern detection examples | ||
| - `airt_*`: Various harm categories from AI Red Team | ||
| - `agent_threat_rules`: Agent Threat Rules (ATR) adversarial payloads for prompt injection, tool poisoning, and other AI-agent attack classes |
Contributor
There was a problem hiding this comment.
if you rerun the 1_loading_datasets notebook it will update the list there, too. This is just a small subset. I have a pr out for doing that in fact #1707
Author
@romanlutz pointed out the manual entry in 0_dataset.md is a small hardcoded subset; the canonical list is generated by re-executing 1_loading_datasets.ipynb (which his microsoft#1707 handles). Dropping the manual line; auto-registration via SeedDatasetProvider already ensures agent_threat_rules appears in the regenerated notebook output once microsoft#1707 lands.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Draft PR implementing the dataset loader proposed in #1702, per @romanlutz's directional guidance (GitHub-hosted source, taxonomy as-is, scorer kept separate for a follow-up). Opening as draft so feedback can happen on actual code rather than spec — happy to iterate or close if direction shifts.
Note on file naming: the 5/10 issue comment cited
pyrit/datasets/atr_threat.pyandpyrit/datasets/README.md. After reading the codebase, the actual conventions arepyrit/datasets/seed_datasets/remote/<dataset>_dataset.py(matching_HarmBenchDataset,_PromptIntelDataset, etc.) and the dataset listing lives indoc/code/datasets/0_dataset.md(nopyrit/datasets/README.mdexists). I followed those instead — easy to rename if a different layout is preferred.What this PR adds
A new remote dataset loader at
pyrit/datasets/seed_datasets/remote/agent_threat_rules_dataset.pythat surfaces the Agent Threat Rules (ATR) autoresearch adversarial-payload corpus as a PyRITSeedDataset.ATR is an open MIT-licensed detection standard for AI agent threats. The autoresearch corpus (
data/autoresearch/adversarial-samples.json) contains 1,054 attack-payload entries across ten base rule scenarios in six of the ten ATR categories (prompt-injection, tool-poisoning, context-exfiltration, agent-manipulation, privilege-escalation, skill-compromise). Each payload carries an attack technique label (paraphrase, language_switch, encoding, role_play, and 17 others) and the agent surface it targets (user_input,content,tool_args,tool_name,tool_response,agent_output).Reference: https://github.com/Agent-Threat-Rule/agent-threat-rules
License: MIT
Files touched
pyrit/datasets/seed_datasets/remote/agent_threat_rules_dataset.py(new, 294 lines) — the loader, three companion enums (ATRCategory,ATRDetectionField,ATRVariationType), and a_RULE_ID_TO_CATEGORYdict that resolves each rule_id to its ATR taxonomy categorypyrit/datasets/seed_datasets/remote/__init__.py— adds the import to trigger auto-registration viaSeedDatasetProvider.__init_subclass__, plus four entries in__all__tests/unit/datasets/test_agent_threat_rules_dataset.py(new, 208 lines) — 13 unit tests covering happy path, missing-key validation, unknown rule_id skip path, all four filter axes, and enum-validation errorsdoc/code/datasets/0_dataset.md— one-line addition to the "Examples of built-in datasets" listNo PyRIT core code is modified. No new dependencies are added.
Implementation notes
db793f9(current main HEAD when this PR was authored). This mirrors HarmBench's pinning convention (c0423b9) for reproducibility; pass the raw URL onmainor a different tag to track upstream.adversarial-samples.jsonmaps to oneSeedPrompt. Thepayloadbecomesvalue. The four upstream metadata fields (original_rule_id,technique,detection_field,variation_type) plus the upstream entry id are preserved onSeedPrompt.metadata.harm_categoriesis set to a single-element list with the ATR taxonomy category resolved via the loader's_RULE_ID_TO_CATEGORYdict.categories,techniques,detection_fields, andvariation_typesarguments narrow the dataset client-side after fetch. Enum arguments are validated against their expected types via the inherited_validate_enumshelper, matching the pattern in_PromptIntelDataset.original_rule_idis not in the loader's category mapping are skipped (not errored) with an aggregate warning. This handles upstream rule additions that land before the loader's mapping is extended — users get a working dataset minus the unmapped rules, not a runtime failure._RemoteDatasetLoader, so caching, file-type inference, and thepublic_url/fileswitch are all inherited — no duplicated infrastructure.What this PR does NOT include (per #1702 discussion)
harm_categoriesas-is per the same guidance.Optional context for PyRIT users
ATR was recently integrated into MISP at two layers (merged 2026-05-10 by Alexandre Dulaunoy, MISP project lead):
cve/owasp_llm/mitre_atlascross-references per clusterMentioning since PyRIT users routing red-team output into MISP-compatible threat-intel or CSIRT pipelines could benefit from the
original_rule_idmetadata on eachSeedPromptresolving natively as MISP machine tags downstream — no translation layer needed. Not required for the loader itself; just a downstream interop note.Tests and Documentation
13 new unit tests in
tests/unit/datasets/test_agent_threat_rules_dataset.py:ruff checkandruff format --checkboth pass on the new files and the modified__init__.py.A real-network fetch against the pinned upstream URL was verified locally: 1,054 seeds load with the expected category distribution (prompt-injection 496, context-exfiltration 186, skill-compromise 124, tool-poisoning 93, agent-manipulation 93, privilege-escalation 62).
The new loader will be picked up automatically by
tests/end_to_end/test_all_datasets.pyviaSeedDatasetProvider.get_all_providers()discovery — no parametrization update needed there.JupyText was not run because this PR does not touch any notebooks or
doc/code/.pyfiles (only the markdown0_dataset.md).