Zone Map Pruning for Metrics by alexanderbianchi · Pull Request #6363 · quickwit-oss/quickwit

alexanderbianchi · 2026-04-29T22:41:50Z

Summary

Adds conservative scan-time metadata pruning for metrics splits after metastore split discovery.
Uses exact split metadata first (metric_name, low-cardinality tags), then falls back to per-column zonemap_regexes when available.
Supports string equality, IN, and simple prefix LIKE patterns such as host LIKE 'ID-07%'.
Evaluates prefix LIKE with DFA prefix-language matching against the zonemap regex, so it checks whether any value matching the prefix could exist in the split.
Caches compiled exact-match regexes and prefix DFAs with mini-moka.
Keeps splits when metadata is missing, regexes are invalid, DFA compilation exceeds the per-regex limit, or an expression is outside the supported pruning subset.
Allows metadata pruning for any declared table column that has split metadata, not only the default metrics tag columns.

Notes

Zonemap metadata is a column_name -> superset_regex map generated by the parquet writer for string-valued sort-schema columns. This PR only prunes when the fetched split metadata has usable information for the queried column; DataFusion still applies the row-level filter afterward.

The Java/logs evaluator tests that apply to metrics string zonemaps were ported. Logs-specific integer min/max, hash rowset, ReaderQL function, and case-insensitive ReaderQL cases are out of scope for this metrics/DataFusion path.

Testing

cargo fmt -p quickwit-datafusion
cargo test -p quickwit-datafusion

alexanderbianchi · 2026-04-30T00:27:31Z

@codex review

chatgpt-codex-connector · 2026-04-30T00:34:30Z

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

g-talbot · 2026-04-30T13:04:14Z

    "timeseries_id",
    "timestamp_secs",
 ];
+const ZONEMAP_DFA_SIZE_LIMIT_BYTES: usize = 1_000_000;


What's this number?

Added a code comment here. The 1 MB limit caps DFA determinization/memory for pathological zonemap regexes; if compilation exceeds it, we fall back to conservative keep-split behavior.

g-talbot

Two concerns. The first is that the compiled DFA for the regex should be cached. Creating it can be a very expensive operation. Second, let's make sure that all of the tests that the Java code ran against this matching are translated here. Other than that, this LG and I'll LGTM once these are addressed.

g-talbot · 2026-04-30T13:06:13Z

+}
+
+fn zonemap_may_match_any_prefix(superset_regex: &str, prefixes: &[String]) -> bool {
+    let dfa_config = dense::Config::new()


It will be very important for performance to cache the config and DFA. It can be very expensive to build these.

Addressed. The scan path now caches compiled regex::Regex values and compiled prefix DFAs by zonemap regex string, uses shared DFA/syntax config, and keeps the DFA cache bounded separately because each DFA can be much larger than a regex::Regex.

g-talbot · 2026-04-30T13:06:41Z

        );
    }
+
+    #[test]


Are we doing all the same tests that the Java code did?

Addressed for the matching semantics that apply to metrics metadata. I added tests mirroring the Java evaluator string-zonemap cases: missing metadata, exact regex, superset regex, case-sensitive matching, multi-column AND, conservative OR, IN lists, invalid regex, newline/DOTALL behavior, and unsupported string forms staying conservative. The Java integer min/max, hash rowset, ReaderQL function, and case-insensitive ReaderQL equality cases are logs/Event Store-specific, so I called those out in the PR description instead of porting them as metrics tests.

alexanderbianchi force-pushed the bianchi/zonemap branch from cdd27b7 to 910756a Compare April 29, 2026 23:45

alexanderbianchi force-pushed the bianchi/zonemap branch 2 times, most recently from f2b6096 to 93bcde4 Compare April 30, 2026 12:14

g-talbot reviewed Apr 30, 2026

View reviewed changes

g-talbot requested changes Apr 30, 2026

View reviewed changes

alexanderbianchi force-pushed the bianchi/zonemap branch 6 times, most recently from 79803a9 to 85cf63c Compare May 1, 2026 20:11

Zone Map Pruning for Metrics

b28a172

alexanderbianchi force-pushed the bianchi/zonemap branch from 85cf63c to b28a172 Compare May 1, 2026 20:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zone Map Pruning for Metrics#6363

Zone Map Pruning for Metrics#6363
alexanderbianchi wants to merge 1 commit intomainfrom
bianchi/zonemap

alexanderbianchi commented Apr 29, 2026 •

edited

Loading

Uh oh!

alexanderbianchi commented Apr 30, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 30, 2026

Uh oh!

g-talbot Apr 30, 2026

Uh oh!

alexanderbianchi Apr 30, 2026

Uh oh!

g-talbot left a comment

Uh oh!

g-talbot Apr 30, 2026

Uh oh!

alexanderbianchi Apr 30, 2026

Uh oh!

g-talbot Apr 30, 2026

Uh oh!

alexanderbianchi Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

                       );
                   }
+                  #[test]

Conversation

alexanderbianchi commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Notes

Testing

Uh oh!

alexanderbianchi commented Apr 30, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 30, 2026

Uh oh!

g-talbot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

alexanderbianchi Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

g-talbot left a comment

Choose a reason for hiding this comment

Uh oh!

g-talbot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

alexanderbianchi Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

g-talbot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

alexanderbianchi Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alexanderbianchi commented Apr 29, 2026 •

edited

Loading