igerber · igerber · May 17, 2026 · May 16, 2026 · May 16, 2026 · May 16, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
diff --git a/TODO.md b/TODO.md
@@ -128,13 +128,13 @@ Deferred items from PR reviews that were not addressed before merge.
 | Conley + survey weights / `survey_design`. Score-reweighted meat `s_i = w_i · X_i · ε_i` is mechanical, but PSU clustering interaction with the spatial kernel and replicate-weights variance under spatial correlation are non-trivial (Bertanha-Imbens 2014 covers cluster-sample but not the explicit Conley case). Phase 5 of the spillover-conley initiative; paper review prerequisite. Currently raises `NotImplementedError` at the linalg validator. | `linalg.py::_validate_vcov_args` | Phase 5 (spillover-conley) | Medium |
 | `SyntheticDiD(vcov_type="conley")` support. Currently raises `TypeError` at `__init__` because SyntheticDiD uses `variance_method ∈ {bootstrap, jackknife, placebo}` rather than the analytical sandwich that Conley plugs into. Wiring would require either reimplementing an analytical sandwich path for SyntheticDiD or designing a spatial-block bootstrap (new methodology, Politis-Romano 1994 territory). | `synthetic_did.py::SyntheticDiD` | follow-up (spillover-conley) | Low |
 | `SpilloverDiD` Gardner GMM first-stage uncertainty correction at stage 2. Wave B MVP uses standard `solve_ols` variance (HC1 / Conley / cluster) without the influence-function adjustment for stage-1 FE estimation. Extending `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the IF outer-product step gives the full Butts (2021) Section 3.1 + Gardner (2022) Section 4 composition. See plan Risks #2 for the IF formula. | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_gmm_variance` | follow-up (Wave B) | Medium |
-| `SpilloverDiD(event_study=True)` per-event-time × ring decomposition (Butts Section 5 / Table 2 `S^k_{it}` / `Ring^k_{it,j}`). Currently raises `NotImplementedError`. The implementation adds event-time dummies × ring covariates to the stage-2 design and emits a MultiIndex on `spillover_effects`. | `spillover.py::SpilloverDiD.fit` | follow-up (Wave B) | Medium |
 | `SpilloverDiD(survey_design=...)` integration. Currently raises `NotImplementedError`. Requires threading survey weights through the inline stage 1 + stage 2 and lifting `two_stage.py`'s survey path patterns. | `spillover.py::SpilloverDiD.fit` | follow-up (Wave B) | Low |
 | `SpilloverDiD(ring_method="count")` extension. Currently only the nearest-treated-ring specification is exposed. Count-of-treated-in-ring (paper Section 3.2 end) is methodologically supported by Butts but re-introduces functional-form dependence; expose with an explicit kwarg gate and documentation warning. | `spillover.py::SpilloverDiD.fit` | follow-up | Low |
 | `SpilloverDiD` data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight cross-validation). | `spillover.py::SpilloverDiD` | follow-up | Low |
 | `SpilloverDiD` T22 TVA tutorial (`docs/tutorials/22_spillover_did.ipynb`): synthetic TVA-style DGP reproducing Butts (2021) Section 4 Table 1 Panel A bias-correction direction (~40% understatement). Split from the methodology PR per user-confirmed scope split (2026-05-15). | `docs/tutorials/`, `tests/test_t22_*_drift.py` | follow-up (Wave B) | Medium |
 | Extend `TwoStageDiD` with Conley vcov as a first-class feature (mirrors Wave A's TWFE/MPD/DiD extension). Currently `TwoStageDiD.__init__` lacks `vcov_type` / `conley_*` kwargs; `SpilloverDiD` works around this by threading Conley directly via `solve_ols` at stage 2. Promoting Conley to TwoStageDiD's API removes the workaround and lets non-spillover users access Conley + Gardner two-stage. | `diff_diff/two_stage.py` | follow-up | Medium |
 | `SpilloverDiD` sparse cKDTree path for the staggered nearest-treated-distance helper (mirrors the static helper's sparse branch). Currently `_compute_nearest_treated_distance_staggered` always builds dense `(n_units, n_treated_by_onset)` pairwise distance matrices per cohort; on large staggered panels with many cohorts this is avoidable memory/runtime. Add a sparse k-d-tree branch analogous to `_compute_nearest_treated_distance_sparse`, gated on `n > _CONLEY_SPARSE_N_THRESHOLD`. | `spillover.py::_compute_nearest_treated_distance_staggered` | follow-up (Wave B) | Low |
+| `SpilloverDiDResults` in `DiagnosticReport` dispatch tables. Wave C event-study emits a TwoStageDiD-compatible `event_study_effects: Dict[int, Dict]` alias that `plot_event_study` consumes via the new `reference_period` attribute fallback in `_extract_plot_data`, but `SpilloverDiDResults` is NOT registered in `DiagnosticReport`'s `_APPLICABILITY` / `_PT_METHOD` tables — so `DiagnosticReport(spillover_result)` doesn't currently route to event-study diagnostics. Registering requires (a) deciding which diagnostics apply (parallel trends, pre-trends power, heterogeneity, design-effect) AND (b) adding an end-to-end test. | `diff_diff/diagnostic_report.py::_APPLICABILITY`, `_PT_METHOD` | follow-up (Wave C) | Low |
 
 #### Performance
 

diff --git a/diff_diff/guides/llms-full.txt b/diff_diff/guides/llms-full.txt
@@ -477,8 +477,8 @@ SpilloverDiD(
     cluster: str | None = None,
     alpha: float = 0.05,
     anticipation: int = 0,
-    event_study: bool = False,           # Deferred: raises NotImplementedError if True
-    horizon_max: int | None = None,      # Deferred (event-study mode)
+    event_study: bool = False,           # Wave C: per-event-time × ring decomposition (Butts Table 2)
+    horizon_max: int | None = None,      # Bin event-times outside [-H,+H] into endpoint pools (event-study mode); H>=1 or None — H=0 rejected (use event_study=False for aggregate spec)
     rank_deficient_action: str = "warn",
 )
 ```
@@ -502,8 +502,7 @@ sp.fit(
 
 - `covariates=` raises `NotImplementedError`. Gardner two-stage requires covariate effects estimated on the untreated-and-unexposed Omega_0 subsample at stage 1; appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates. Planned follow-up.
 - `survey_design=` raises `NotImplementedError` (planned: SurveyDesign integration)
-- `event_study=True` raises `NotImplementedError` (planned: per-event-time × ring decomposition per Butts Table 2)
-- `horizon_max=` raises `NotImplementedError` (used only with event_study)
+- `event_study=True` SHIPPED (Wave C): emits per-event-time `tau_k` and per-(ring, event-time) `delta_jk` as `att_dynamic: pd.DataFrame` (indexed by event-time `k`) plus MultiIndex `spillover_effects: pd.DataFrame` (levels `(ring_label, event_time)`). TwoStageDiD-compatible `event_study_effects: Dict[int, Dict]` alias also emitted for `plot_event_study` consumption — `_extract_plot_data` prefers the new `reference_period` attribute over the legacy `n_obs==0` heuristic. (DiagnosticReport integration: NOT yet wired; queued as a follow-up.) (schema: `{k: {"effect", "se", "n_obs", "t_stat", "p_value", "conf_int": (low, high)}}` mirroring `two_stage.py:1355-1389`). Reference period `ref_period = -1 - anticipation` (TwoStageDiD `two_stage.py:486` convention); reference row uses `coef=0.0, se=0.0, n_obs=0, conf_int=(0.0, 0.0)`. Scalar `att` field becomes a sample-share-weighted average of post-treatment `tau_k` (`att = sum_{k>=0} w_k * tau_k` with `w_k = n_treated_at_k / total`) with SE from linear-combination inference `Var(att) = w' V_subset w` on the post-treatment vcov block — no separate fit. **Two-clock K_it:** direct-effect clock is `K_direct = t - effective_first_treat(i)` for ever-treated rows; spillover clock is `K_spill = t - earliest-in-range-cohort-onset(i)` (running min across activated cohorts, NaN pre-trigger). `K_spill >= 0` structurally; negative-k spillover cells are rectangularly emitted with `coef = NaN, n_obs = 0`. **`horizon_max` semantics:** bins event-times outside `[-H, +H]` into endpoint pools (no observations dropped — divergence from TwoStageDiD which filters; intentional, per `feedback_no_silent_failures`). With `horizon_max=None`, auto-detects bin set from observed K. **Validation:** `horizon_max < 0` raises `ValueError`; `ref_period < -horizon_max` (i.e., `anticipation > horizon_max - 1`) raises `ValueError` — silently floor-shifting the reference would change identification. **Reduce-to-aggregate:** under constant-tau DGP with `horizon_max=None`, the share-weighted scalar `att` reproduces Wave B's aggregate bit-identically. **Note:** `horizon_max=0` does NOT reduce to Wave B (binning collapses pre-treatment K values to `k=0`, making `D^0 = D_i` ever-treated indicator rather than `D_it`). Per-event-time SEs share the same Wave B Gardner-GMM caveat (biased downward by a few percent; Wave D follow-up).
 - Stage-2 variance is `solve_ols` HC1 / Conley / cluster — Gardner GMM first-stage uncertainty correction NOT applied (planned follow-up; SE is biased downward / too small, CIs too narrow, p-values too small — treat reported significance conservatively until the GMM correction lands)
 - Only nearest-treated rings supported; `ring_method="count"` (count of treated neighbors in ring) not yet exposed
 

diff --git a/diff_diff/results.py b/diff_diff/results.py
@@ -408,37 +408,102 @@ class SpilloverDiDResults(DiDResults):
     event_study: Optional[bool] = field(default=None)
     stage1_n_obs: Optional[int] = field(default=None)
     anticipation: Optional[int] = field(default=None)
+    # Wave C event-study fields (None when event_study=False):
+    att_dynamic: Optional[pd.DataFrame] = field(default=None)
+    # Per-event-time direct effects DataFrame indexed by integer k.
+    # Columns: ["coef", "se", "t_stat", "p_value", "ci_low", "ci_high", "n_obs"].
+    # Includes the reference period row with coef=0.0, se=0.0, n_obs=0.
+    event_study_effects: Optional[Dict[int, Dict[str, Any]]] = field(default=None)
+    # TwoStageDiD-compatible alias for ``att_dynamic`` consumable by
+    # ``plot_event_study`` (wired in Wave C via the ``reference_period``
+    # attribute fallback in ``_extract_plot_data``). ``DiagnosticReport``
+    # routing is NOT yet wired — registering ``SpilloverDiDResults`` in
+    # ``DiagnosticReport``'s applicability/method tables is a planned
+    # follow-up (see TODO.md).
+    # Schema mirrors ``two_stage.py:1355-1389``:
+    #   {k: {"effect", "se", "n_obs", "t_stat", "p_value", "conf_int": (low, high)}}
+    # Reference row uses ``conf_int = (0.0, 0.0)`` (TwoStageDiD parity).
+    horizon_max: Optional[int] = field(default=None)
+    reference_period: Optional[int] = field(default=None)
 
     def summary(self, alpha: Optional[float] = None) -> str:
-        """Extended summary with ATT row plus per-ring rows."""
+        """Extended summary with ATT row, per-event-time direct block, and
+        per-(ring, event-time) spillover block."""
         base = super().summary(alpha=alpha)
-        if self.spillover_effects is None or self.spillover_effects.empty:
+        insert_blocks: List[str] = []
+
+        # Wave C event-study: per-event-time direct effects block.
+        if self.att_dynamic is not None and not self.att_dynamic.empty:
+            insert_blocks.append("")
+            insert_blocks.append("Dynamic Direct Effects by Event Time".center(70))
+            insert_blocks.append("-" * 70)
+            insert_blocks.append(
+                f"{'k':<15} {'Estimate':>12} {'Std. Err.':>12} "
+                f"{'t-stat':>10} {'P>|t|':>10} {'':>5}"
+            )
+            insert_blocks.append("-" * 70)
+            for k, row in self.att_dynamic.iterrows():
+                coef = row.get("coef", np.nan)
+                se = row.get("se", np.nan)
+                t_stat = row.get("t_stat", np.nan)
+                p_value = row.get("p_value", np.nan)
+                stars = _get_significance_stars(p_value)
+                k_str = f"{int(k):+d}"
+                insert_blocks.append(
+                    f"{k_str:<15} {coef:>12.4f} {se:>12.4f} "
+                    f"{t_stat:>10.3f} {p_value:>10.4f} {stars:>5}"
+                )
+            insert_blocks.append("-" * 70)
+
+        # Spillover block (per-ring OR per-(ring, k) under MultiIndex).
+        # When the index is a MultiIndex (event-study mode), the ring and `k`
+        # are rendered as separate columns so distinct horizons within the same
+        # ring remain visually distinguishable. The non-MultiIndex aggregate
+        # path retains the single `Ring` column for Wave B compatibility.
+        if self.spillover_effects is not None and not self.spillover_effects.empty:
+            insert_blocks.append("")
+            insert_blocks.append("Spillover Effects (ring-indicator, Butts 2021)".center(70))
+            insert_blocks.append("-" * 70)
+            is_multi = isinstance(self.spillover_effects.index, pd.MultiIndex)
+            if is_multi:
+                header = (
+                    f"{'Ring':<15} {'k':>5} {'Estimate':>12} {'Std. Err.':>12} "
+                    f"{'t-stat':>10} {'P>|t|':>10} {'':>5}"
+                )
+            else:
+                header = (
+                    f"{'Ring':<15} {'Estimate':>12} {'Std. Err.':>12} "
+                    f"{'t-stat':>10} {'P>|t|':>10} {'':>5}"
+                )
+            insert_blocks.append(header)
+            insert_blocks.append("-" * len(header.rstrip()))
+            for label, row in self.spillover_effects.iterrows():
+                coef = row.get("coef", np.nan)
+                se = row.get("se", np.nan)
+                t_stat = row.get("t_stat", np.nan)
+                p_value = row.get("p_value", np.nan)
+                stars = _get_significance_stars(p_value)
+                if is_multi and isinstance(label, tuple):
+                    ring_str = str(label[0])[:15]
+                    k_str = f"{int(label[1]):+d}"
+                    insert_blocks.append(
+                        f"{ring_str:<15} {k_str:>5} {coef:>12.4f} {se:>12.4f} "
+                        f"{t_stat:>10.3f} {p_value:>10.4f} {stars:>5}"
+                    )
+                else:
+                    label_str = str(label)[:15]
+                    insert_blocks.append(
+                        f"{label_str:<15} {coef:>12.4f} {se:>12.4f} "
+                        f"{t_stat:>10.3f} {p_value:>10.4f} {stars:>5}"
+                    )
+            insert_blocks.append("-" * len(header.rstrip()))
+
+        if not insert_blocks:
             return base
         lines = base.split("\n")
-        # Find the closing separator line and inject ring rows before it.
-        ring_rows = ["", "Spillover Effects (ring-indicator, Butts 2021)".center(70), "-" * 70]
-        header = (
-            f"{'Ring':<15} {'Estimate':>12} {'Std. Err.':>12} "
-            f"{'t-stat':>10} {'P>|t|':>10} {'':>5}"
-        )
-        ring_rows.append(header)
-        ring_rows.append("-" * 70)
-        for label, row in self.spillover_effects.iterrows():
-            coef = row.get("coef", np.nan)
-            se = row.get("se", np.nan)
-            t_stat = row.get("t_stat", np.nan)
-            p_value = row.get("p_value", np.nan)
-            stars = _get_significance_stars(p_value)
-            label_str = str(label) if not isinstance(label, tuple) else f"{label[0]} k={label[1]}"
-            ring_rows.append(
-                f"{label_str[:15]:<15} {coef:>12.4f} {se:>12.4f} "
-                f"{t_stat:>10.3f} {p_value:>10.4f} {stars:>5}"
-            )
-        ring_rows.append("-" * 70)
-        # Insert ring block before the final "==..." line (last row of base).
         for idx in range(len(lines) - 1, -1, -1):
             if lines[idx].startswith("="):
-                lines = lines[:idx] + ring_rows + lines[idx:]
+                lines = lines[:idx] + insert_blocks + lines[idx:]
                 break
         return "\n".join(lines)
 
@@ -460,6 +525,14 @@ def to_dict(self) -> Dict[str, Any]:
                 "event_study": self.event_study,
                 "stage1_n_obs": self.stage1_n_obs,
                 "anticipation": self.anticipation,
+                "att_dynamic": (
+                    self.att_dynamic.reset_index().to_dict(orient="records")
+                    if self.att_dynamic is not None
+                    else None
+                ),
+                "event_study_effects": self.event_study_effects,
+                "horizon_max": self.horizon_max,
+                "reference_period": self.reference_period,
             }
         )
         return base