feat: Add per-column metrics to summary by EgeKaraismailogluQC · Pull Request #34 · Quantco/diffly

EgeKaraismailogluQC · 2026-04-30T16:00:24Z

Motivation

Closes #15. Users comparing data frames with numerical columns have asked for aggregate statistics (mean, quantiles, deviations) alongside the existing "Match Rate" and "Top Changes" per column. Until now the summary only showed where columns differ; this adds by how much.

The feature is optional: passing no metrics argument preserves today's output.

Changes

New diffly.metrics module: preset callables mean, median, min, max, std, mean_absolute_deviation, mean_relative_deviation, and a quantile(q) factory. Each computes an aggregation over left, right across all joined rows.
New metrics argument on summary(): accepts Mapping[str, Callable[[pl.Expr, pl.Expr], pl.Expr]]. Dict keys become column headers; values can be presets, quantile(q), or user lambdas.
Forwarded through assert_frame_equal, assert_collection_equal, and exposed in the CLI via repeatable --metric <preset-name>.
Rendering: new metric columns appear between "Match Rate" and "Top Changes". Headers are shown only when metrics are present (preserving existing look otherwise). Float values use .4g formatting. Non-numerical columns get blank cells.
Internals: column applicability is driven by a Polars selector per metric.
Docs: new section in the Summary guide, API reference listing for diffly.metrics, and a Key Features bullet on the landing page.

Example

Specifying metrics:

from diffly import compare_frames, metrics

comp = compare_frames(left, right, primary_key="id")
print(
    comp.summary(
        top_k_column_changes=3,
        metrics={
            "mean": metrics.mean,
            "max": metrics.max,
        },
    )
)

Output (Columns section):

 Columns
 ▔▔▔▔▔▔▔
   ┏━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━┳━━━━━┳━━━━━━━━━━━━━━━━━━━┓
   ┃ Column ┃ Match Rate ┃ mean ┃ max ┃       Top Changes ┃
   ┡━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━╇━━━━━╇━━━━━━━━━━━━━━━━━━━┩
   │ price  │     60.00% │  0.6 │   2 │ 40.0 -> 42.0 (1x) │
   │        │            │      │     │ 20.0 -> 21.0 (1x) │
   ├────────┼────────────┼──────┼─────┼───────────────────┤
   │ qty    │     80.00% │  0.2 │   1 │       4 -> 5 (1x) │
   ├────────┼────────────┼──────┼─────┼───────────────────┤
   │ status │     80.00% │      │     │   "c" -> "x" (1x) │
   └────────┴────────────┴──────┴─────┴───────────────────┘

Custom metrics are equally easy:

metrics={"max_abs": lambda left, right: (right - left).abs().max()}

codecov · 2026-04-30T16:02:03Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (ef4c5e9) to head (e8aa08a).

Additional details and impacted files

@@            Coverage Diff            @@
##              main       #34   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           10        11    +1     
  Lines          930      1014   +84     
=========================================
+ Hits           930      1014   +84

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

Adds optional per-column numeric aggregation metrics to DataFrameComparison.summary() output (and JSON), with presets exposed via a new diffly.metrics module and a repeatable CLI flag. This addresses requests for “how much” numeric columns differ, not just “where”.

Changes:

Introduce diffly.metrics with preset metric callables and a quantile(q) factory.
Thread a new optional metrics mapping through summary(), testing helpers, and the CLI (--metric).
Add unit + golden/fixture tests and update docs to describe/auto-document metrics.

Reviewed changes

Copilot reviewed 142 out of 142 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/test_metrics.py	New unit tests validating metric preset computations (mean/std/quantile/etc.).
tests/summary/test_summary.py	Extends summary JSON-shape assertions to include per-column `metrics` and adds a metrics-enabled scenario.
tests/summary/fixtures/metrics_presets_many/test_metrics_presets_many.py	Fixture generator for many preset metrics in summary output.
tests/summary/fixtures/metrics_presets_many/gen/pretty_True_perfect_True_top_False_slim_True_sample_rows_True_sample_pk_False.txt	Updated golden output for “many presets” rendering (pretty ANSI).
tests/summary/fixtures/metrics_presets_many/gen/pretty_True_perfect_True_top_False_slim_True_sample_rows_False_sample_pk_False.txt	Updated golden output for “many presets” rendering (pretty ANSI).
tests/summary/fixtures/metrics_presets_many/gen/pretty_True_perfect_False_top_False_slim_True_sample_rows_True_sample_pk_False.txt	Updated golden output for “many presets” rendering (pretty ANSI).
tests/summary/fixtures/metrics_presets_many/gen/pretty_True_perfect_False_top_False_slim_True_sample_rows_False_sample_pk_False.txt	Updated golden output for “many presets” rendering (pretty ANSI).
tests/summary/fixtures/metrics_presets_many/gen/pretty_False_perfect_True_top_True_slim_True_sample_rows_True_sample_pk_True.txt	Updated golden output for “many presets” rendering (plain).
tests/summary/fixtures/metrics_presets_many/gen/pretty_False_perfect_True_top_True_slim_True_sample_rows_False_sample_pk_False.txt	Updated golden output for “many presets” rendering (plain).
tests/summary/fixtures/metrics_presets_many/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_False_sample_pk_False.txt	Updated golden output for “many presets” rendering (plain).
tests/summary/fixtures/metrics_presets_many/gen/pretty_False_perfect_True_top_False_slim_True_sample_rows_True_sample_pk_False.txt	Updated golden output for “many presets” rendering (plain).
tests/summary/fixtures/metrics_presets_many/gen/pretty_False_perfect_True_top_False_slim_True_sample_rows_False_sample_pk_False.txt	Updated golden output for “many presets” rendering (plain).
tests/summary/fixtures/metrics_presets_many/gen/pretty_False_perfect_True_top_False_slim_False_sample_rows_True_sample_pk_False.txt	Updated golden output for “many presets” rendering (plain).
tests/summary/fixtures/metrics_presets_many/gen/pretty_False_perfect_True_top_False_slim_False_sample_rows_False_sample_pk_False.txt	Updated golden output for “many presets” rendering (plain).
tests/summary/fixtures/metrics_presets_many/gen/pretty_False_perfect_False_top_True_slim_True_sample_rows_True_sample_pk_True.txt	Updated golden output for “many presets” rendering when perfect matches hidden (plain).
tests/summary/fixtures/metrics_presets_many/gen/pretty_False_perfect_False_top_True_slim_True_sample_rows_False_sample_pk_False.txt	Updated golden output for “many presets” rendering when perfect matches hidden (plain).
tests/summary/fixtures/metrics_presets_many/gen/pretty_False_perfect_False_top_True_slim_False_sample_rows_False_sample_pk_False.txt	Updated golden output for “many presets” rendering when perfect matches hidden (plain).
tests/summary/fixtures/metrics_presets_many/gen/pretty_False_perfect_False_top_False_slim_True_sample_rows_True_sample_pk_False.txt	Updated golden output for “many presets” rendering (plain, no top changes).
tests/summary/fixtures/metrics_presets_many/gen/pretty_False_perfect_False_top_False_slim_True_sample_rows_False_sample_pk_False.txt	Updated golden output for “many presets” rendering (plain, no top changes).
tests/summary/fixtures/metrics_presets_many/gen/pretty_False_perfect_False_top_False_slim_False_sample_rows_True_sample_pk_False.txt	Updated golden output for “many presets” rendering (plain, full header).
tests/summary/fixtures/metrics_presets_many/gen/pretty_False_perfect_False_top_False_slim_False_sample_rows_False_sample_pk_False.txt	Updated golden output for “many presets” rendering (plain, full header).
tests/summary/fixtures/metrics_presets_few/test_metrics_presets_few.py	Fixture generator for a smaller set of preset metrics.
tests/summary/fixtures/metrics_presets_few/gen/pretty_True_perfect_True_top_True_slim_True_sample_rows_True_sample_pk_True.txt	Updated golden output for “few presets” rendering (pretty ANSI).
tests/summary/fixtures/metrics_presets_few/gen/pretty_True_perfect_True_top_True_slim_True_sample_rows_False_sample_pk_False.txt	Updated golden output for “few presets” rendering (pretty ANSI).
tests/summary/fixtures/metrics_presets_few/gen/pretty_True_perfect_True_top_True_slim_False_sample_rows_True_sample_pk_True.txt	Updated golden output for “few presets” rendering (pretty ANSI, full header).
tests/summary/fixtures/metrics_presets_few/gen/pretty_True_perfect_True_top_True_slim_False_sample_rows_False_sample_pk_False.txt	Updated golden output for “few presets” rendering (pretty ANSI, full header).
tests/summary/fixtures/metrics_presets_few/gen/pretty_True_perfect_True_top_False_slim_True_sample_rows_True_sample_pk_False.txt	Updated golden output for “few presets” rendering without Top Changes (pretty ANSI).
tests/summary/fixtures/metrics_presets_few/gen/pretty_True_perfect_True_top_False_slim_True_sample_rows_False_sample_pk_False.txt	Updated golden output for “few presets” rendering without Top Changes (pretty ANSI).
tests/summary/fixtures/metrics_presets_few/gen/pretty_True_perfect_True_top_False_slim_False_sample_rows_True_sample_pk_False.txt	Updated golden output for “few presets” rendering without Top Changes (pretty ANSI, full header).
tests/summary/fixtures/metrics_presets_few/gen/pretty_True_perfect_True_top_False_slim_False_sample_rows_False_sample_pk_False.txt	Updated golden output for “few presets” rendering without Top Changes (pretty ANSI, full header).
tests/summary/fixtures/metrics_presets_few/gen/pretty_True_perfect_False_top_True_slim_True_sample_rows_True_sample_pk_True.txt	Updated golden output for “few presets” rendering w/ imperfect-only columns (pretty ANSI).
tests/summary/fixtures/metrics_presets_few/gen/pretty_True_perfect_False_top_True_slim_True_sample_rows_False_sample_pk_False.txt	Updated golden output for “few presets” rendering w/ imperfect-only columns (pretty ANSI).
tests/summary/fixtures/metrics_presets_few/gen/pretty_True_perfect_False_top_True_slim_False_sample_rows_False_sample_pk_False.txt	Updated golden output for “few presets” rendering w/ imperfect-only columns (pretty ANSI, full header).
tests/summary/fixtures/metrics_presets_few/gen/pretty_True_perfect_False_top_False_slim_True_sample_rows_True_sample_pk_False.txt	Updated golden output for “few presets” rendering w/ imperfect-only columns (pretty ANSI, no top changes).
tests/summary/fixtures/metrics_presets_few/gen/pretty_True_perfect_False_top_False_slim_True_sample_rows_False_sample_pk_False.txt	Updated golden output for “few presets” rendering w/ imperfect-only columns (pretty ANSI, no top changes).
tests/summary/fixtures/metrics_presets_few/gen/pretty_True_perfect_False_top_False_slim_False_sample_rows_True_sample_pk_False.txt	Updated golden output for “few presets” rendering w/ imperfect-only columns (pretty ANSI, full header, no top changes).
tests/summary/fixtures/metrics_presets_few/gen/pretty_True_perfect_False_top_False_slim_False_sample_rows_False_sample_pk_False.txt	Updated golden output for “few presets” rendering w/ imperfect-only columns (pretty ANSI, full header, no top changes).
tests/summary/fixtures/metrics_presets_few/gen/pretty_False_perfect_True_top_True_slim_True_sample_rows_True_sample_pk_True.txt	Updated golden output for “few presets” rendering (plain).
tests/summary/fixtures/metrics_presets_few/gen/pretty_False_perfect_True_top_True_slim_True_sample_rows_False_sample_pk_False.txt	Updated golden output for “few presets” rendering (plain).
tests/summary/fixtures/metrics_presets_few/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_True_sample_pk_True.txt	Updated golden output for “few presets” rendering (plain, full header).
tests/summary/fixtures/metrics_presets_few/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_False_sample_pk_False.txt	Updated golden output for “few presets” rendering (plain, full header).
tests/summary/fixtures/metrics_presets_few/gen/pretty_False_perfect_True_top_False_slim_True_sample_rows_True_sample_pk_False.txt	Updated golden output for “few presets” rendering without Top Changes (plain).
tests/summary/fixtures/metrics_presets_few/gen/pretty_False_perfect_True_top_False_slim_True_sample_rows_False_sample_pk_False.txt	Updated golden output for “few presets” rendering without Top Changes (plain).
tests/summary/fixtures/metrics_presets_few/gen/pretty_False_perfect_True_top_False_slim_False_sample_rows_True_sample_pk_False.txt	Updated golden output for “few presets” rendering without Top Changes (plain, full header).
tests/summary/fixtures/metrics_presets_few/gen/pretty_False_perfect_True_top_False_slim_False_sample_rows_False_sample_pk_False.txt	Updated golden output for “few presets” rendering without Top Changes (plain, full header).
tests/summary/fixtures/metrics_presets_few/gen/pretty_False_perfect_False_top_True_slim_True_sample_rows_True_sample_pk_True.txt	Updated golden output for “few presets” rendering w/ imperfect-only columns (plain).
tests/summary/fixtures/metrics_presets_few/gen/pretty_False_perfect_False_top_True_slim_True_sample_rows_False_sample_pk_False.txt	Updated golden output for “few presets” rendering w/ imperfect-only columns (plain).
tests/summary/fixtures/metrics_presets_few/gen/pretty_False_perfect_False_top_True_slim_False_sample_rows_True_sample_pk_True.txt	Updated golden output for “few presets” rendering w/ imperfect-only columns (plain, full header).
tests/summary/fixtures/metrics_presets_few/gen/pretty_False_perfect_False_top_True_slim_False_sample_rows_False_sample_pk_False.txt	Updated golden output for “few presets” rendering w/ imperfect-only columns (plain, full header).
tests/summary/fixtures/metrics_presets_few/gen/pretty_False_perfect_False_top_False_slim_True_sample_rows_True_sample_pk_False.txt	Updated golden output for “few presets” rendering w/ imperfect-only columns and no top changes (plain).
tests/summary/fixtures/metrics_presets_few/gen/pretty_False_perfect_False_top_False_slim_True_sample_rows_False_sample_pk_False.txt	Updated golden output for “few presets” rendering w/ imperfect-only columns and no top changes (plain).
tests/summary/fixtures/metrics_presets_few/gen/pretty_False_perfect_False_top_False_slim_False_sample_rows_True_sample_pk_False.txt	Updated golden output for “few presets” rendering w/ imperfect-only columns and no top changes (plain, full header).
tests/summary/fixtures/metrics_presets_few/gen/pretty_False_perfect_False_top_False_slim_False_sample_rows_False_sample_pk_False.txt	Updated golden output for “few presets” rendering w/ imperfect-only columns and no top changes (plain, full header).
tests/summary/fixtures/metrics_long_labels/test_metrics_long_labels.py	Fixture generator to stress-test rendering with long metric labels.
tests/summary/fixtures/metrics_long_labels/gen/pretty_False_perfect_True_top_False_slim_True_sample_rows_True_sample_pk_False.txt	Updated golden output for long-label metrics rendering (plain).
tests/summary/fixtures/metrics_long_labels/gen/pretty_False_perfect_True_top_False_slim_True_sample_rows_False_sample_pk_False.txt	Updated golden output for long-label metrics rendering (plain).
tests/summary/fixtures/metrics_long_labels/gen/pretty_False_perfect_True_top_False_slim_False_sample_rows_True_sample_pk_False.txt	Updated golden output for long-label metrics rendering (plain, full header).
tests/summary/fixtures/metrics_long_labels/gen/pretty_False_perfect_True_top_False_slim_False_sample_rows_False_sample_pk_False.txt	Updated golden output for long-label metrics rendering (plain, full header).
tests/summary/fixtures/metrics_long_labels/gen/pretty_False_perfect_False_top_False_slim_True_sample_rows_True_sample_pk_False.txt	Updated golden output for long-label metrics rendering w/ imperfect-only columns (plain).
tests/summary/fixtures/metrics_long_labels/gen/pretty_False_perfect_False_top_False_slim_True_sample_rows_False_sample_pk_False.txt	Updated golden output for long-label metrics rendering w/ imperfect-only columns (plain).
tests/summary/fixtures/metrics_long_labels/gen/pretty_False_perfect_False_top_False_slim_False_sample_rows_True_sample_pk_False.txt	Updated golden output for long-label metrics rendering w/ imperfect-only columns (plain, full header).
tests/summary/fixtures/metrics_long_labels/gen/pretty_False_perfect_False_top_False_slim_False_sample_rows_False_sample_pk_False.txt	Updated golden output for long-label metrics rendering w/ imperfect-only columns (plain, full header).
tests/summary/fixtures/metrics_custom/test_metrics_custom.py	Fixture generator demonstrating a quantile metric + a user lambda metric.
tests/summary/fixtures/metrics_custom/gen/pretty_True_perfect_True_top_True_slim_True_sample_rows_True_sample_pk_True.txt	Updated golden output for custom metrics rendering (pretty ANSI).
tests/summary/fixtures/metrics_custom/gen/pretty_True_perfect_True_top_True_slim_True_sample_rows_False_sample_pk_False.txt	Updated golden output for custom metrics rendering (pretty ANSI).
tests/summary/fixtures/metrics_custom/gen/pretty_True_perfect_True_top_False_slim_True_sample_rows_True_sample_pk_False.txt	Updated golden output for custom metrics rendering without Top Changes (pretty ANSI).
tests/summary/fixtures/metrics_custom/gen/pretty_True_perfect_True_top_False_slim_True_sample_rows_False_sample_pk_False.txt	Updated golden output for custom metrics rendering without Top Changes (pretty ANSI).
tests/summary/fixtures/metrics_custom/gen/pretty_True_perfect_True_top_False_slim_False_sample_rows_True_sample_pk_False.txt	Updated golden output for custom metrics rendering without Top Changes (pretty ANSI, full header).
tests/summary/fixtures/metrics_custom/gen/pretty_True_perfect_True_top_False_slim_False_sample_rows_False_sample_pk_False.txt	Updated golden output for custom metrics rendering without Top Changes (pretty ANSI, full header).
tests/summary/fixtures/metrics_custom/gen/pretty_True_perfect_False_top_True_slim_True_sample_rows_True_sample_pk_True.txt	Updated golden output for custom metrics rendering w/ imperfect-only columns (pretty ANSI).
tests/summary/fixtures/metrics_custom/gen/pretty_True_perfect_False_top_True_slim_True_sample_rows_False_sample_pk_False.txt	Updated golden output for custom metrics rendering w/ imperfect-only columns (pretty ANSI).
tests/summary/fixtures/metrics_custom/gen/pretty_True_perfect_False_top_False_slim_True_sample_rows_True_sample_pk_False.txt	Updated golden output for custom metrics rendering w/ imperfect-only columns and no Top Changes (pretty ANSI).
tests/summary/fixtures/metrics_custom/gen/pretty_True_perfect_False_top_False_slim_True_sample_rows_False_sample_pk_False.txt	Updated golden output for custom metrics rendering w/ imperfect-only columns and no Top Changes (pretty ANSI).
tests/summary/fixtures/metrics_custom/gen/pretty_True_perfect_False_top_False_slim_False_sample_rows_True_sample_pk_False.txt	Updated golden output for custom metrics rendering w/ imperfect-only columns and no Top Changes (pretty ANSI, full header).
tests/summary/fixtures/metrics_custom/gen/pretty_True_perfect_False_top_False_slim_False_sample_rows_False_sample_pk_False.txt	Updated golden output for custom metrics rendering w/ imperfect-only columns and no Top Changes (pretty ANSI, full header).
tests/summary/fixtures/metrics_custom/gen/pretty_False_perfect_True_top_True_slim_True_sample_rows_True_sample_pk_True.txt	Updated golden output for custom metrics rendering (plain).
tests/summary/fixtures/metrics_custom/gen/pretty_False_perfect_True_top_True_slim_True_sample_rows_False_sample_pk_False.txt	Updated golden output for custom metrics rendering (plain).
tests/summary/fixtures/metrics_custom/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_True_sample_pk_True.txt	Updated golden output for custom metrics rendering (plain, full header).
tests/summary/fixtures/metrics_custom/gen/pretty_False_perfect_True_top_True_slim_False_sample_rows_False_sample_pk_False.txt	Updated golden output for custom metrics rendering (plain, full header).
tests/summary/fixtures/metrics_custom/gen/pretty_False_perfect_True_top_False_slim_True_sample_rows_True_sample_pk_False.txt	Updated golden output for custom metrics rendering without Top Changes (plain).
tests/summary/fixtures/metrics_custom/gen/pretty_False_perfect_True_top_False_slim_True_sample_rows_False_sample_pk_False.txt	Updated golden output for custom metrics rendering without Top Changes (plain).
tests/summary/fixtures/metrics_custom/gen/pretty_False_perfect_True_top_False_slim_False_sample_rows_True_sample_pk_False.txt	Updated golden output for custom metrics rendering without Top Changes (plain, full header).
tests/summary/fixtures/metrics_custom/gen/pretty_False_perfect_True_top_False_slim_False_sample_rows_False_sample_pk_False.txt	Updated golden output for custom metrics rendering without Top Changes (plain, full header).
tests/summary/fixtures/metrics_custom/gen/pretty_False_perfect_False_top_True_slim_True_sample_rows_True_sample_pk_True.txt	Updated golden output for custom metrics rendering w/ imperfect-only columns (plain).
tests/summary/fixtures/metrics_custom/gen/pretty_False_perfect_False_top_True_slim_True_sample_rows_False_sample_pk_False.txt	Updated golden output for custom metrics rendering w/ imperfect-only columns (plain).
tests/summary/fixtures/metrics_custom/gen/pretty_False_perfect_False_top_True_slim_False_sample_rows_True_sample_pk_True.txt	Updated golden output for custom metrics rendering w/ imperfect-only columns (plain, full header).
tests/summary/fixtures/metrics_custom/gen/pretty_False_perfect_False_top_True_slim_False_sample_rows_False_sample_pk_False.txt	Updated golden output for custom metrics rendering w/ imperfect-only columns (plain, full header).
tests/summary/fixtures/metrics_custom/gen/pretty_False_perfect_False_top_False_slim_True_sample_rows_True_sample_pk_False.txt	Updated golden output for custom metrics rendering w/ imperfect-only columns and no Top Changes (plain).
tests/summary/fixtures/metrics_custom/gen/pretty_False_perfect_False_top_False_slim_True_sample_rows_False_sample_pk_False.txt	Updated golden output for custom metrics rendering w/ imperfect-only columns and no Top Changes (plain).
tests/summary/fixtures/metrics_custom/gen/pretty_False_perfect_False_top_False_slim_False_sample_rows_True_sample_pk_False.txt	Updated golden output for custom metrics rendering w/ imperfect-only columns and no Top Changes (plain, full header).
tests/summary/fixtures/metrics_custom/gen/pretty_False_perfect_False_top_False_slim_False_sample_rows_False_sample_pk_False.txt	Updated golden output for custom metrics rendering w/ imperfect-only columns and no Top Changes (plain, full header).
docs/index.md	Adds a “Per-column metrics” bullet to the landing page feature list.
docs/api/summary.rst	Documents the `metrics` API and auto-documents `diffly.metrics` presets/factory.
diffly/testing.py	Adds `metrics` passthrough to `assert_frame_equal` / `assert_collection_equal` and documents it.
diffly/metrics.py	New metrics module: metric type alias, presets, and `quantile()` factory.
diffly/comparison.py	Adds `metrics` parameter to `DataFrameComparison.summary()` and passes it into `Summary`.
diffly/cli.py	Adds repeatable `--metric` option and wires presets into summary rendering.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Eric Brahmann (EED85) · 2026-05-01T06:44:20Z

Hi EgeKaraismailogluQC ,

looks very nice, I'd like to try it out!
As there is no developement section in the readme:

uv pip install -e . in an `uv environment should do the trick?

Eric

EgeKaraismailogluQC · 2026-05-01T07:50:20Z

Hi Eric Brahmann (@EED85)! You can find the development guide here.

Marius Merkle (MariusMerkleQC) · 2026-05-01T11:15:59Z

            )
        ),
    ] = [],
+    metric: Annotated[


Suggested change

metric: Annotated[

metrics: Annotated[

This is intended. It is a repeatable argument that is used as follows:

Marius Merkle (MariusMerkleQC) · 2026-05-01T11:26:43Z

+    return _yellow(raw)
+
+
+def _format_metric_value(value: Any) -> str:


Why don't we just add the case

if isinstance(value, float): return _yellow(f"{value:.4g}")

to _format_value() above? I think that would make it more consistent and the difference of the functions _format_value() and _format_metric_value() less confusing.

Thanks for this comment, I was also not very content with these two methods. The drawback of your suggestion is that it would format all floats with .4g, not just metrics. I'm not sure if we want that. I solved it differently now by making _format_value accept an float_format argument and setting this only in _format_metric_value, PTAL.

Marius Merkle (MariusMerkleQC) · 2026-05-01T11:28:53Z

+    if not metrics:
+        return {}


This is redundant as we pass metrics_resolved: dict[str, Metric] = dict(metrics or {}). As a side note, I find this LLM behavior very annoying that they add multiple safeguards so you no longer know what to assume when 😅

Thanks, removed.

As a side note, I find this LLM behavior very annoying that they add multiple safeguards so you no longer know what to assume when 😅

Yes, I totally agree. I already deleted several other guards like this before publishing the PR but missed this one. Thanks for spotting. Maybe something for our AGENTS.md?

Marius Merkle (MariusMerkleQC) · 2026-05-01T11:34:48Z

+
+Metrics
+=======
+
+.. currentmodule:: diffly.metrics
+
+The ``metrics`` argument of :meth:`~diffly.comparison.DataFrameComparison.summary`
+accepts a mapping from display label to a :data:`Metric` callable. :mod:`diffly.metrics`
+ships a set of presets.
+
+.. autodata:: Metric
+   :no-value:
+
+.. autosummary::
+   :toctree: _gen/
+
+   mean
+   median
+   min
+   max
+   std
+   mean_absolute_deviation
+   mean_relative_deviation
+   quantile


I'd keep this empty, this is only the main page of the summary. We also do not go into more depth on other arguments.

Suggested change

Metrics

=======

.. currentmodule:: diffly.metrics

The ``metrics`` argument of :meth:`~diffly.comparison.DataFrameComparison.summary`

accepts a mapping from display label to a :data:`Metric` callable. :mod:`diffly.metrics`

ships a set of presets.

.. autodata:: Metric

:no-value:

.. autosummary::

:toctree: _gen/

mean

median

min

max

std

mean_absolute_deviation

mean_relative_deviation

quantile

For the other arguments, the docstring of Summary already tells the user everything they need to know. This is not true for metrics, so I think we must document the list of pre-defined metrics somewhere, otherwise the user cannot discover them. We can either document it under Summary because metrics are only used in the summary, or we can create a new card/section for metrics entirely. I have a slight preference for the second option. Which option do you prefer? Do you see a better alternative?

Marius Merkle (MariusMerkleQC) · 2026-05-01T11:36:45Z

@@ -1,613 +1,642 @@
 {


Do you have a suggestion how to review this? Many more changed lines that "actual changes"...

Yes, you can directly view the docs in the PR: the only change is a new "Numerical metrics" section that appears second-to-last.

Marius Merkle (MariusMerkleQC) · 2026-05-01T11:54:28Z

+    "mean": mean,
+    "median": median,
+    "min": min,
+    "max": max,
+    "std": std,
+    "mean_absolute_deviation": mean_absolute_deviation,
+    "mean_relative_deviation": mean_relative_deviation,


I think it would be nicer to capitalizes, so that all column headers are capitalized in the Columns section of the summary.

Suggested change

"mean": mean,

"median": median,

"min": min,

"max": max,

"std": std,

"mean_absolute_deviation": mean_absolute_deviation,

"mean_relative_deviation": mean_relative_deviation,

"Mean": mean,

"Median": median,

"Min": min,

"Max": max,

"Std": std,

"Mean_absolute_deviation": mean_absolute_deviation,

"Mean_relative_deviation": mean_relative_deviation,

Yes, I like that. To take it a step further, I would also like to remove the underscores. I implemented the corresponding logic change, this does mean that the metric names need to be wrapped with quotes when using the CLI, e.g. --metric "Mean absolute deviation". I'm personally fine with this, wdyt? If we don't want the quotes, we can also implement logic that capitalizes and replaces the underscores with space, but I think the current state is potentially easier to understand for the user.

Co-authored-by: Marius Merkle <122545105+MariusMerkleQC@users.noreply.github.com>

EgeKaraismailogluQC added 5 commits April 30, 2026 16:03

wip

ae98b68

wip

31bcd39

wip

db7735f

DAB-1463

a1a854e

wip

d3cc4f1

github-actions Bot added the enhancement New feature or request label Apr 30, 2026

EgeKaraismailogluQC marked this pull request as ready for review April 30, 2026 16:01

EgeKaraismailogluQC requested a review from Oliver Borchert (borchero) as a code owner April 30, 2026 16:01

Copilot AI review requested due to automatic review settings April 30, 2026 16:01

EgeKaraismailogluQC requested a review from Marius Merkle (MariusMerkleQC) as a code owner April 30, 2026 16:01

Copilot started reviewing on behalf of EgeKaraismailogluQC April 30, 2026 16:02 View session

Copilot AI reviewed Apr 30, 2026

View reviewed changes

Comment thread diffly/cli.py

Comment thread diffly/metrics.py

Comment thread diffly/comparison.py

coverage

8261168

Marius Merkle (MariusMerkleQC) reviewed May 1, 2026

View reviewed changes

EgeKaraismailogluQC and others added 13 commits May 4, 2026 10:45

review

584ba71

review

f0e78b7

review

025d8ff

review

5f5e0e5

review

4d79870

Update diffly/summary.py

266bebd

Co-authored-by: Marius Merkle <122545105+MariusMerkleQC@users.noreply.github.com>

review

f32f793

review

8bca8d6

review

019d914

review

73bb89f

review

58645d0

merge main

36d9ecb

pre-commit

5eb965c

pre-commit

e8aa08a

		return _yellow(raw)


		def _format_metric_value(value: Any) -> str:

Conversation

EgeKaraismailogluQC commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Example

Uh oh!

codecov Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Eric Brahmann (EED85) commented May 1, 2026

Uh oh!

EgeKaraismailogluQC commented May 1, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EgeKaraismailogluQC May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EgeKaraismailogluQC May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EgeKaraismailogluQC May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

EgeKaraismailogluQC commented Apr 30, 2026 •

edited

Loading

codecov Bot commented Apr 30, 2026 •

edited

Loading

EgeKaraismailogluQC May 4, 2026 •

edited

Loading

EgeKaraismailogluQC May 4, 2026 •

edited

Loading

EgeKaraismailogluQC May 4, 2026 •

edited

Loading