Skip to content

Surface get_llm_guide / profile_panel / practitioner_next_steps more prominently for agent discovery #460

@igerber

Description

@igerber

Summary

The library already ships excellent agent-oriented primitives (get_llm_guide, profile_panel, practitioner_next_steps, BusinessReport, etc.), but LLM agents that import diff_diff often don't discover them naturally. When an agent runs dir(diff_diff), these entrypoints appear alongside 240+ other names and don't stand out as the recommended first calls. Improving their discoverability inside the library — without adding new functionality — would meaningfully improve agent adherence to the intended workflow.

Background

Cold-start LLM-agent dry-pass experiments at igerber/causal-llm-eval (see writeups/dry_pass_2026-05-16.md for the methodology) measured how Anthropic Claude agents engage with diff_diff versus statsmodels on a staggered-adoption ATT estimation task. The harness captures three layers of telemetry (stream-JSON transcript, in-process Python instrumentation, subprocess stderr) so we can see exactly which library entrypoints the agent actually invokes.

Across a 4-condition × 2-model × 2-library-arm matrix, we observed:

  • Agents that decide to use diff_diff typically follow this discovery loop: pip list → import diff_diff → dir(diff_diff) → help(SomeClass) → call.
  • The top-level __doc__ is helpful when reached (especially the "For AI agents" section), but the agent workflow comes a few paragraphs into the docstring.
  • get_llm_guide, profile_panel, and practitioner_next_steps are top-level exports, but they appear alphabetically interleaved with 240+ other names in dir(diff_diff).
  • Even when an experiment prompt explicitly instructed agents to "consult any installed library's llms.txt if present," some agents used shell find + cat to locate the file rather than calling the existing diff_diff.get_llm_guide(level=...) Python entrypoint.
  • Agents practically never called profile_panel or practitioner_next_steps on their own, despite both being directly designed for the agent workflow.

Proposed changes

All docs-side; no functional API changes; no breakage.

1. Make the agent workflow the FIRST thing in diff_diff.__doc__

The current __doc__ is good but the "For AI agents" section comes after the library overview. Lift the agent workflow to the top of the docstring so help(diff_diff) opens with it:

"""diff-diff: Difference-in-Differences causal inference with sklearn-like API.

Recommended workflow (especially for LLM agents):

    1. Describe the data shape:
           diff_diff.profile_panel(df, unit=..., time=..., treatment=..., outcome=...)

    2. Get a methodology recommendation:
           diff_diff.get_llm_guide("autonomous")     # estimator-support matrix + reasoning
           diff_diff.get_llm_guide("practitioner")   # Baker et al. (2025) 8-step recipe

    3. Fit an estimator and report:
           diff_diff.<Estimator>(...).fit(...)
           diff_diff.practitioner_next_steps(results)   # context-aware follow-up
           diff_diff.BusinessReport(results)            # structured stakeholder narrative

For the full guide: diff_diff.get_llm_guide("full")

(rest of current __doc__ follows)
"""

2. Consider an agent_workflow() convenience entrypoint

Bundles the recommended steps into one obvious name:

def agent_workflow(df, *, unit, time, treatment, outcome, fit=False):
    """One-call agent-oriented workflow: profile + recommend + (optionally) fit + report.

    Returns a structured object with the panel profile, recommended estimator(s),
    fitted result (if fit=True), and a business-style narrative.
    """

This is the kind of name agents naturally try (`.agent_workflow`, `.help`, `.recommend`). Having ONE obvious entrypoint that does the whole canonical workflow significantly reduces the agent's decision surface and gives a recognizable target for "where should I start with this library?"

3. (Optional) Make dir(diff_diff) slightly more agent-friendly

The current alphabetic export list (Alert, BJS, BUSINESS_REPORT_SCHEMA_VERSION, …) puts utility constants and internal-ish names at the top. Consider organizing __all__ so the agent-facing entrypoints (get_llm_guide, profile_panel, agent_workflow, BusinessReport, …) come first.

(Python convention varies on whether __all__ order is preserved in dir() output. This is a minor polish; the main lever is items 1 and 2.)

Reproducibility

The behavior is reproducible without running the full causal-llm-eval harness:

  1. Create a fresh Python venv: python -m venv /tmp/test-venv && source /tmp/test-venv/bin/activate
  2. Install diff-diff: pip install diff-diff
  3. Spawn an LLM agent (Claude Code, or any agent framework) with a cold-start prompt like:
    You have a panel dataset at data.parquet with columns unit, period, outcome,
    first_treat, treated. Estimate the average treatment effect on the treated (ATT).
    Save your solution to solution.py and print your point estimate.
    
  4. Observe the agent's Bash commands. The agent will (with positive probability that varies by model capability):
    • Discover diff_diff in pip list
    • Run import diff_diff and dir(diff_diff)
    • Pick an estimator class WITHOUT first calling get_llm_guide or profile_panel

The hypothesis under the proposed changes: when __doc__ opens with the recommended workflow, more agents will call profile_panel and/or get_llm_guide before picking an estimator. Quantifying the effect would require re-running the same dry pass after the change.

Effort estimate

  • Item 1 (rewrite __doc__): ~30 minutes
  • Item 2 (agent_workflow() function): ~2-3 hours including tests
  • Item 3 (__all__ ordering): ~30 minutes if pursued

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions