Skip to content

Add design for ALTER TABLE EXPORT + design review skill#1673

Open
hodgesrm wants to merge 5 commits intoAltinity:antalya-26.1from
hodgesrm:01-alter-table-export-design
Open

Add design for ALTER TABLE EXPORT + design review skill#1673
hodgesrm wants to merge 5 commits intoAltinity:antalya-26.1from
hodgesrm:01-alter-table-export-design

Conversation

@hodgesrm
Copy link
Copy Markdown
Member

@hodgesrm hodgesrm commented Apr 22, 2026

Changelog category (leave one):

  • Documentation (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Added a skill for Antalya design creation and review. Also added a first cut design spec for ALTER TABLE EXPORT based on current documentation and tests.

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

@hodgesrm hodgesrm requested a review from arthurpassos April 22, 2026 03:26
@hodgesrm hodgesrm force-pushed the 01-alter-table-export-design branch from 06d5b84 to 227f7b1 Compare April 22, 2026 03:42
Copy link
Copy Markdown
Collaborator

@arthurpassos arthurpassos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see some mistakes done by Claude here in regards to plain object storage vs iceberg exports. I assume it is because it has consumed the antalya-26.1 branch docs that do not mention Iceberg at all. Those docs are in my 1618 PR.

Half way through, still more to come

Comment thread antalya/docs/design/alter-table-export-part-partition.md Outdated
Comment thread antalya/docs/design/alter-table-export-part-partition.md Outdated
Comment thread antalya/docs/design/alter-table-export-part-partition.md Outdated
Comment thread antalya/docs/design/alter-table-export-part-partition.md
Comment thread antalya/docs/design/alter-table-export-part-partition.md Outdated
-- Split large part across multiple Parquet files
ALTER TABLE big EXPORT PART '2025_0_32_3' TO TABLE big_dest
SETTINGS allow_experimental_export_merge_tree_part = 1,
export_merge_tree_part_max_bytes_per_file = 10000000,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

export_merge_tree_part_max_bytes_per_file

We have a funny situation with this. When I added this setting, ClickHouse did not have the ability to split out parquet files. Plus, we did not support exporting to Iceberg.

The iceberg engine now has its own settings. Which one should we respect?

Copy link
Copy Markdown
Member Author

@hodgesrm hodgesrm Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I vote we go with the Iceberg engine for consistency and to simplify EXPORT commands. Questions:

  1. iceberg_insert_max_bytes_in_data_file should supersede export_merge_tree_part_max_bytes_per_file , correct?
  2. Export should also obey iceberg_insert_max_rows_in_data_file, as you should in your example?
  3. What about settings for grow groups? (e.g., output_format_parquet_row_group_size and output_format_parquet_row_group_size_bytes). I assume we observe all of these. It also seems like you would want to set them at the table level for sanity since it's a lot to remember if you export more than one time.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR:

You are correct on your expectations, but that's not quite how it works nowadays.

Full

You touched on an important topic. There is a subtle difference on how settings are handled between PART and PARTITION export.

For Part export, the settings used at the moment of the query are all preserved and re-stored in the background export task.

For partition export, it is slightly harder to preserve the settings because we need to serialize them in ZooKeeper. I have done that only for a few very important settings, including export_merge_tree_part_max_bytes_per_file.

Somse settings are not preserved simply because I did not have the time and I was trying to find a better way to handle this case, which I did not yet.

Copy link
Copy Markdown
Collaborator

@arthurpassos arthurpassos Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed a change to 1618 that preserves Iceberg max rows and max bytes

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about modifying export_merge_tree_max_bytes_per_file to output_format_parquet_file_size_bytes?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I don't like this idea very much. This implies it would have to be implemented for regular inserts as well, which is extra work and introduces some conflicts with upstream.

Additionally, by putting parquet in the setting name you are scoping in to parquet files. The code that splits the files is agnostic to the format. It would also conflict with the iceberg max size and etc.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. export_merge_tree_max_bytes_per_file overrides more specific values (in the design).

Comment thread antalya/docs/design/alter-table-export-part-partition.md
Comment thread antalya/docs/design/alter-table-export-part-partition.md Outdated
Comment thread antalya/docs/design/alter-table-export-part-partition.md Outdated
Comment thread antalya/docs/design/alter-table-export-part-partition.md Outdated
@hodgesrm hodgesrm force-pushed the 01-alter-table-export-design branch from 227f7b1 to bcb2b8c Compare April 23, 2026 23:56
@hodgesrm hodgesrm changed the title Add directory for antalya docs, skills, and designs Add design for ALTER TABLE EXPORT + design review skill Apr 24, 2026
Copy link
Copy Markdown
Collaborator

@arthurpassos arthurpassos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll stop my review here (https://github.com/Altinity/ClickHouse/pull/1673/changes#diff-59e17f10a30e7277ca6b00d367036de94c72a8e1ddcdb063ad9255e3aa707a81R516) as we'll discuss these things in a meeting shortly. It is a long doc lol.

Comment thread antalya/docs/design/alter-table-export-part-partition.md
Comment thread antalya/docs/design/alter-table-export-part-partition.md Outdated
Comment thread antalya/docs/design/alter-table-export-part-partition.md
Comment thread antalya/docs/design/alter-table-export-part-partition.md Outdated
Comment thread antalya/docs/design/alter-table-export-part-partition.md Outdated
Comment thread antalya/docs/design/alter-table-export-part-partition.md
-- Split large part across multiple Parquet files
ALTER TABLE big EXPORT PART '2025_0_32_3' TO TABLE big_dest
SETTINGS allow_experimental_export_merge_tree_part = 1,
export_merge_tree_part_max_bytes_per_file = 10000000,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I don't like this idea very much. This implies it would have to be implemented for regular inserts as well, which is extra work and introduces some conflicts with upstream.

Additionally, by putting parquet in the setting name you are scoping in to parquet files. The code that splits the files is agnostic to the format. It would also conflict with the iceberg max size and etc.

Comment thread antalya/docs/design/alter-table-export-part-partition.md Outdated
Comment thread antalya/docs/design/alter-table-export-part-partition.md
Comment thread antalya/docs/design/alter-table-export-part-partition.md
Signed-off-by: Robert Hodges <rhodges@altinity.com>
Also incorporated design feedback.

Signed-off-by: Robert Hodges <rhodges@altinity.com>
Signed-off-by: Robert Hodges <rhodges@altinity.com>
Signed-off-by: Robert Hodges <rhodges@altinity.com>
@hodgesrm hodgesrm force-pushed the 01-alter-table-export-design branch from ebbd2e2 to 915df13 Compare April 26, 2026 04:25
Signed-off-by: Robert Hodges <rhodges@altinity.com>
Copy link
Copy Markdown
Collaborator

@arthurpassos arthurpassos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I did not review the testing section, will leave that for QA

- EXPORT PARTITION for MergeTree tables. Must work without Keeper installation.
- Export history. Provide a system table to track the history of part exports.
- Flexible casting that addresses issues like the following.
- Handling potentially lossy casts like INSERT SELECT: int64 -> int32.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By handling you mean throw, accept or a setting that allows the user to decide?

swap in `DatabaseIceberg` or an `iceberg(...)` catalog-backed table
function to route through REST / Glue / Unity.

**Open Issue** We need to specify how to commit via a catalog.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wdym by open issue? It is already supported to export to Iceberg tables backed by a catalog.


The following notes expand on expected behavior of commands.

1. When writing to object storage using `partition_strategy = 'wildcard'`, either wildcard
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing here is weird:

"when writing with partition_strategy = 'wildcard', either wildcard and hive can be used"

I think you meant "when writing to plain object storage, either wildcard and hive can be used" - or something of the sort

- For best results `EXPORT PARTITION` requires a ZooKeeper / `clickhouse-keeper` ensemble with
the `multi_read` feature flag. This reduces API calls and ensures transactional consistency
when reading multiple fields.
governed by the destination table's Iceberg partition spec instead.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor formatting issue.

- **Plain object storage.** One Parquet data file per part (or per chunk when split by
size/rows) plus one commit file per transaction — `<dest>/<partition>/<part>_<checksum>.<N>.parquet`
and `<dest>/commit_<...>`. Readers that want atomicity filter by commit. This is the default path,
but it can be customized to handle sharding, which is not covered by the default case. See the
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"See the ..."?

export artifacts; the Iceberg sidecars (`*_clickhouse_export_part_sidecar.avro`)
are safe to delete once the commit has landed.

9. **Settings in `PART` vs `PARTIION` export.** There is a subtle difference in
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo


Both commands accept two destination families:

1. **Plain object storage** (`S3`, `AzureBlobStorage`, equivalents).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is AzureBlobStorage something we want to explicitly test? Most of the regression tests are S3-centric.

This design only covers user visible behavior. It does not internal implementatation
details. The implementation section is omitted.

## 4. Test plan
Copy link
Copy Markdown
Collaborator

@DimensionWieldr DimensionWieldr Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this section targeted towards tests written by devs, or a guide for regression test development, or both?

I've been wondering for a while actually, what the difference is between the regression repo and the tests written by devs included in the CH repo under tests/? I'm aware that regression has deeper coverage and also some overlap with the dev tests but beyond that, why do we have both, what is the purpose of each?

Copy link
Copy Markdown
Collaborator

@arthurpassos arthurpassos Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have both, what is the purpose of each

I summon @vzakaznikov in attack position

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants