Summary
The library already ships excellent agent-oriented primitives (get_llm_guide, profile_panel, practitioner_next_steps, BusinessReport, etc.), but LLM agents that import diff_diff often don't discover them naturally. When an agent runs dir(diff_diff), these entrypoints appear alongside 240+ other names and don't stand out as the recommended first calls. Improving their discoverability inside the library — without adding new functionality — would meaningfully improve agent adherence to the intended workflow.
Background
Cold-start LLM-agent dry-pass experiments at igerber/causal-llm-eval (see writeups/dry_pass_2026-05-16.md for the methodology) measured how Anthropic Claude agents engage with diff_diff versus statsmodels on a staggered-adoption ATT estimation task. The harness captures three layers of telemetry (stream-JSON transcript, in-process Python instrumentation, subprocess stderr) so we can see exactly which library entrypoints the agent actually invokes.
Across a 4-condition × 2-model × 2-library-arm matrix, we observed:
- Agents that decide to use
diff_diff typically follow this discovery loop: pip list → import diff_diff → dir(diff_diff) → help(SomeClass) → call.
- The top-level
__doc__ is helpful when reached (especially the "For AI agents" section), but the agent workflow comes a few paragraphs into the docstring.
get_llm_guide, profile_panel, and practitioner_next_steps are top-level exports, but they appear alphabetically interleaved with 240+ other names in dir(diff_diff).
- Even when an experiment prompt explicitly instructed agents to "consult any installed library's
llms.txt if present," some agents used shell find + cat to locate the file rather than calling the existing diff_diff.get_llm_guide(level=...) Python entrypoint.
- Agents practically never called
profile_panel or practitioner_next_steps on their own, despite both being directly designed for the agent workflow.
Proposed changes
All docs-side; no functional API changes; no breakage.
1. Make the agent workflow the FIRST thing in diff_diff.__doc__
The current __doc__ is good but the "For AI agents" section comes after the library overview. Lift the agent workflow to the top of the docstring so help(diff_diff) opens with it:
"""diff-diff: Difference-in-Differences causal inference with sklearn-like API.
Recommended workflow (especially for LLM agents):
1. Describe the data shape:
diff_diff.profile_panel(df, unit=..., time=..., treatment=..., outcome=...)
2. Get a methodology recommendation:
diff_diff.get_llm_guide("autonomous") # estimator-support matrix + reasoning
diff_diff.get_llm_guide("practitioner") # Baker et al. (2025) 8-step recipe
3. Fit an estimator and report:
diff_diff.<Estimator>(...).fit(...)
diff_diff.practitioner_next_steps(results) # context-aware follow-up
diff_diff.BusinessReport(results) # structured stakeholder narrative
For the full guide: diff_diff.get_llm_guide("full")
(rest of current __doc__ follows)
"""
2. Consider an agent_workflow() convenience entrypoint
Bundles the recommended steps into one obvious name:
def agent_workflow(df, *, unit, time, treatment, outcome, fit=False):
"""One-call agent-oriented workflow: profile + recommend + (optionally) fit + report.
Returns a structured object with the panel profile, recommended estimator(s),
fitted result (if fit=True), and a business-style narrative.
"""
This is the kind of name agents naturally try (`.agent_workflow`, `.help`, `.recommend`). Having ONE obvious entrypoint that does the whole canonical workflow significantly reduces the agent's decision surface and gives a recognizable target for "where should I start with this library?"
3. (Optional) Make dir(diff_diff) slightly more agent-friendly
The current alphabetic export list (Alert, BJS, BUSINESS_REPORT_SCHEMA_VERSION, …) puts utility constants and internal-ish names at the top. Consider organizing __all__ so the agent-facing entrypoints (get_llm_guide, profile_panel, agent_workflow, BusinessReport, …) come first.
(Python convention varies on whether __all__ order is preserved in dir() output. This is a minor polish; the main lever is items 1 and 2.)
Reproducibility
The behavior is reproducible without running the full causal-llm-eval harness:
- Create a fresh Python venv:
python -m venv /tmp/test-venv && source /tmp/test-venv/bin/activate
- Install diff-diff:
pip install diff-diff
- Spawn an LLM agent (Claude Code, or any agent framework) with a cold-start prompt like:
You have a panel dataset at data.parquet with columns unit, period, outcome,
first_treat, treated. Estimate the average treatment effect on the treated (ATT).
Save your solution to solution.py and print your point estimate.
- Observe the agent's Bash commands. The agent will (with positive probability that varies by model capability):
- Discover
diff_diff in pip list
- Run
import diff_diff and dir(diff_diff)
- Pick an estimator class WITHOUT first calling
get_llm_guide or profile_panel
The hypothesis under the proposed changes: when __doc__ opens with the recommended workflow, more agents will call profile_panel and/or get_llm_guide before picking an estimator. Quantifying the effect would require re-running the same dry pass after the change.
Effort estimate
- Item 1 (rewrite
__doc__): ~30 minutes
- Item 2 (
agent_workflow() function): ~2-3 hours including tests
- Item 3 (
__all__ ordering): ~30 minutes if pursued
Related
Summary
The library already ships excellent agent-oriented primitives (
get_llm_guide,profile_panel,practitioner_next_steps,BusinessReport, etc.), but LLM agents that importdiff_diffoften don't discover them naturally. When an agent runsdir(diff_diff), these entrypoints appear alongside 240+ other names and don't stand out as the recommended first calls. Improving their discoverability inside the library — without adding new functionality — would meaningfully improve agent adherence to the intended workflow.Background
Cold-start LLM-agent dry-pass experiments at igerber/causal-llm-eval (see
writeups/dry_pass_2026-05-16.mdfor the methodology) measured how Anthropic Claude agents engage withdiff_diffversusstatsmodelson a staggered-adoption ATT estimation task. The harness captures three layers of telemetry (stream-JSON transcript, in-process Python instrumentation, subprocess stderr) so we can see exactly which library entrypoints the agent actually invokes.Across a 4-condition × 2-model × 2-library-arm matrix, we observed:
diff_difftypically follow this discovery loop:pip list → import diff_diff → dir(diff_diff) → help(SomeClass) → call.__doc__is helpful when reached (especially the "For AI agents" section), but the agent workflow comes a few paragraphs into the docstring.get_llm_guide,profile_panel, andpractitioner_next_stepsare top-level exports, but they appear alphabetically interleaved with 240+ other names indir(diff_diff).llms.txtif present," some agents used shellfind+catto locate the file rather than calling the existingdiff_diff.get_llm_guide(level=...)Python entrypoint.profile_panelorpractitioner_next_stepson their own, despite both being directly designed for the agent workflow.Proposed changes
All docs-side; no functional API changes; no breakage.
1. Make the agent workflow the FIRST thing in
diff_diff.__doc__The current
__doc__is good but the "For AI agents" section comes after the library overview. Lift the agent workflow to the top of the docstring sohelp(diff_diff)opens with it:2. Consider an
agent_workflow()convenience entrypointBundles the recommended steps into one obvious name:
This is the kind of name agents naturally try (`.agent_workflow`, `.help`, `.recommend`). Having ONE obvious entrypoint that does the whole canonical workflow significantly reduces the agent's decision surface and gives a recognizable target for "where should I start with this library?"
3. (Optional) Make
dir(diff_diff)slightly more agent-friendlyThe current alphabetic export list (
Alert,BJS,BUSINESS_REPORT_SCHEMA_VERSION, …) puts utility constants and internal-ish names at the top. Consider organizing__all__so the agent-facing entrypoints (get_llm_guide,profile_panel,agent_workflow,BusinessReport, …) come first.(Python convention varies on whether
__all__order is preserved indir()output. This is a minor polish; the main lever is items 1 and 2.)Reproducibility
The behavior is reproducible without running the full causal-llm-eval harness:
python -m venv /tmp/test-venv && source /tmp/test-venv/bin/activatepip install diff-diffdiff_diffinpip listimport diff_diffanddir(diff_diff)get_llm_guideorprofile_panelThe hypothesis under the proposed changes: when
__doc__opens with the recommended workflow, more agents will callprofile_paneland/orget_llm_guidebefore picking an estimator. Quantifying the effect would require re-running the same dry pass after the change.Effort estimate
__doc__): ~30 minutesagent_workflow()function): ~2-3 hours including tests__all__ordering): ~30 minutes if pursuedRelated
writeups/dry_pass_2026-05-16.md