Antalya 26.3 port - improvements for cluster requests#1687
Open
zvonand wants to merge 6 commits intoantalya-26.3from
Open
Antalya 26.3 port - improvements for cluster requests#1687zvonand wants to merge 6 commits intoantalya-26.3from
zvonand wants to merge 6 commits intoantalya-26.3from
Conversation
…ous_hashing 26.1 Antalya port - improvements for cluster requests
ianton-ru
requested changes
Apr 24, 2026
| std::optional<Int64> rows_count; | ||
| std::optional<Int64> bytes_size; | ||
| std::optional<Int64> nulls_count; | ||
| std::optional<DB::Range> hyperrectangle; |
There was a problem hiding this comment.
This field was removed between 26.1 and 26.3 (ClickHouse#98231), need code refactoring to use new place with min/max column values
Removes the `hyperrectangle` field from `DB::Iceberg::ColumnInfo` that was re-added during the frontport. The field was removed upstream in PR ClickHouse#98231, which relocated raw min/max bounds to `ParsedManifestFileEntry::value_bounds`. The `DataFileMetaInfo` Iceberg constructor now deserializes those bounds via the shared `deserializeFieldFromBinaryRepr` helper (moved from `ManifestFileIterator.cpp` to `IcebergFieldParseHelpers`). Addresses @ianton-ru's comment at #1687 (comment).
…bled The Iceberg read optimization (`allow_experimental_iceberg_read_optimization`) identifies constant columns from Iceberg metadata and removes them from the read request. When all requested columns become constant, it sets `need_only_count = true`, which tells the Parquet reader to skip all initialization — including `preparePrewhere` — and just return the raw row count from file metadata. This completely bypasses `row_level_filter` (row policies) and `prewhere_info`, returning unfiltered row counts. The InterpreterSelectQuery relies on the storage to apply these filters when `supportsPrewhere` is true and does not add a fallback FilterStep to the query plan, so the filter is silently lost. The fix prevents `need_only_count` from being set when an active `row_level_filter` or `prewhere_info` exists in the format filter info. Fixes #1595 (cherry picked from commit f204850)
…t NULLs The Altinity-specific constant column optimization (`allow_experimental_iceberg_read_optimization`) scans `requested_columns` for nullable columns absent from the Iceberg file metadata and replaces them with constant NULLs. However, `requested_columns` can also contain columns produced by `prewhere_info` or `row_level_filter` expressions (e.g. `equals(boolean_col, false)`). These computed columns are not in the file metadata, and their result type is often `Nullable(UInt8)`, so the optimization incorrectly treats them as missing file columns and replaces them with NULLs. This corrupts the prewhere pipeline: the Parquet reader evaluates the filter expression correctly, but the constant column optimization then overwrites the result with NULLs. With `need_filter = false` (old planner, PREWHERE + WHERE), all rows appear to fail the filter, producing empty output. With `need_filter = true`, the filter column is NULL so all rows are filtered out. The fix skips columns that match the `prewhere_info` or `row_level_filter` column names, since these are computed at read time and never stored in the file. (cherry picked from commit b7696a3)
`DataFileMetaInfo::DataFileMetaInfo` (Iceberg constructor introduced in 3be7196) deserialized `value_bounds` using the table's current schema. After schema evolution (e.g. `int` -> `long`) the bytes were still encoded with the file's old type — a 4-byte int — but were read as 8 bytes for `Int64`. `ColumnVector::insertData` ignores the length argument and always reads `sizeof(T)` bytes via `unalignedLoad`, so the extra 4 bytes came from adjacent memory and produced a garbage hyperrectangle. The garbage range often satisfied `Range::isPoint`, which made the iceberg read optimization replace the column with a constant value taken from the garbage bound, corrupting query results. Pass the file's `resolved_schema_id` separately so types are looked up against the schema the data file was written with, while column names keep coming from the current table schema (so the resulting `columns_info` map is keyed by names callers know about). Reproducer: `test_storage_iceberg_schema_evolution/test_evolved_schema_simple.py::test_evolved_schema_simple` — all 12 parametrizations failed at the assertion after `ALTER COLUMN a TYPE BIGINT`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cherry-picked from #1414, also has changes from #1597.
Changelog category (leave one):
Frontports for Antalya 26.1
CI/CD Options
Exclude tests:
Regression jobs to run: