Skip to content

Antalya 26.3 port - improvements for cluster requests#1687

Open
zvonand wants to merge 6 commits intoantalya-26.3from
feature/antalya-26.3/pr-1414-1
Open

Antalya 26.3 port - improvements for cluster requests#1687
zvonand wants to merge 6 commits intoantalya-26.3from
feature/antalya-26.3/pr-1414-1

Conversation

@zvonand
Copy link
Copy Markdown
Collaborator

@zvonand zvonand commented Apr 23, 2026

Cherry-picked from #1414, also has changes from #1597.

Changelog category (leave one):

  • Not for changelog

Frontports for Antalya 26.1

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

…ous_hashing

26.1 Antalya port - improvements for cluster requests
@zvonand zvonand added releasy Created/managed by RelEasy ai-resolved Port conflict auto-resolved by Claude labels Apr 23, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 23, 2026

Workflow [PR], commit [5959fbd]

@zvonand zvonand changed the title Antalya 26.3: 26.1 Antalya port - improvements for cluster requests Antalya 26.3 port - improvements for cluster requests Apr 24, 2026
std::optional<Int64> rows_count;
std::optional<Int64> bytes_size;
std::optional<Int64> nulls_count;
std::optional<DB::Range> hyperrectangle;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field was removed between 26.1 and 26.3 (ClickHouse#98231), need code refactoring to use new place with min/max column values

zvonand and others added 3 commits April 24, 2026 16:36
Removes the `hyperrectangle` field from `DB::Iceberg::ColumnInfo` that
was re-added during the frontport. The field was removed upstream in
PR ClickHouse#98231, which relocated
raw min/max bounds to `ParsedManifestFileEntry::value_bounds`. The
`DataFileMetaInfo` Iceberg constructor now deserializes those bounds via
the shared `deserializeFieldFromBinaryRepr` helper (moved from
`ManifestFileIterator.cpp` to `IcebergFieldParseHelpers`).

Addresses @ianton-ru's comment at #1687 (comment).
…bled

The Iceberg read optimization (`allow_experimental_iceberg_read_optimization`)
identifies constant columns from Iceberg metadata and removes them from the
read request. When all requested columns become constant, it sets
`need_only_count = true`, which tells the Parquet reader to skip all
initialization — including `preparePrewhere` — and just return the raw row
count from file metadata.

This completely bypasses `row_level_filter` (row policies) and `prewhere_info`,
returning unfiltered row counts. The InterpreterSelectQuery relies on the
storage to apply these filters when `supportsPrewhere` is true and does not
add a fallback FilterStep to the query plan, so the filter is silently lost.

The fix prevents `need_only_count` from being set when an active
`row_level_filter` or `prewhere_info` exists in the format filter info.

Fixes #1595

(cherry picked from commit f204850)
…t NULLs

The Altinity-specific constant column optimization
(`allow_experimental_iceberg_read_optimization`) scans `requested_columns`
for nullable columns absent from the Iceberg file metadata and replaces
them with constant NULLs. However, `requested_columns` can also contain
columns produced by `prewhere_info` or `row_level_filter` expressions
(e.g. `equals(boolean_col, false)`). These computed columns are not in
the file metadata, and their result type is often `Nullable(UInt8)`, so
the optimization incorrectly treats them as missing file columns and
replaces them with NULLs.

This corrupts the prewhere pipeline: the Parquet reader evaluates the
filter expression correctly, but the constant column optimization then
overwrites the result with NULLs. With `need_filter = false` (old planner,
PREWHERE + WHERE), all rows appear to fail the filter, producing empty
output. With `need_filter = true`, the filter column is NULL so all rows
are filtered out.

The fix skips columns that match the `prewhere_info` or `row_level_filter`
column names, since these are computed at read time and never stored in
the file.

(cherry picked from commit b7696a3)
@zvonand zvonand added the port-antalya PRs to be ported to all new Antalya releases label Apr 27, 2026
`DataFileMetaInfo::DataFileMetaInfo` (Iceberg constructor introduced in
3be7196) deserialized `value_bounds` using the table's current schema.
After schema evolution (e.g. `int` -> `long`) the bytes were still encoded
with the file's old type — a 4-byte int — but were read as 8 bytes for
`Int64`. `ColumnVector::insertData` ignores the length argument and always
reads `sizeof(T)` bytes via `unalignedLoad`, so the extra 4 bytes came from
adjacent memory and produced a garbage hyperrectangle.

The garbage range often satisfied `Range::isPoint`, which made the iceberg
read optimization replace the column with a constant value taken from the
garbage bound, corrupting query results.

Pass the file's `resolved_schema_id` separately so types are looked up
against the schema the data file was written with, while column names
keep coming from the current table schema (so the resulting `columns_info`
map is keyed by names callers know about).

Reproducer: `test_storage_iceberg_schema_evolution/test_evolved_schema_simple.py::test_evolved_schema_simple` —
all 12 parametrizations failed at the assertion after `ALTER COLUMN a TYPE BIGINT`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-resolved Port conflict auto-resolved by Claude antalya-26.3 port-antalya PRs to be ported to all new Antalya releases releasy Created/managed by RelEasy

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants