feat(clickhouse): support LowCardinality, FixedString, CODEC, and SAMPLE BY#8
Conversation
📊 Coverage
Full per-file breakdown in the job summary. |
Greptile SummaryThis PR extends the ClickHouse schema builder with four production-oriented DDL features —
Confidence Score: 5/5All four new schema features are correctly scoped to ClickHouse-specific builder types, validated at configure time, and compiled in the right DDL order — safe to merge. The new LowCardinality, FixedString, CODEC, and SAMPLE BY features are self-contained in the ClickHouse builder layer. Wrapping orders, DDL clause ordering, and validation guards are all correct. Methods are only reachable through Column\ClickHouse / Table\ClickHouse, so cross-dialect misuse is prevented at the type level rather than by runtime branches. The test suite covers both happy-path DDL output and validation-error cases for each feature. No files require special attention. Important Files Changed
Reviews (4): Last reviewed commit: "refactor(clickhouse): drop empty Feature..." | Re-trigger Greptile |
…ia Feature\OLAP Adds four OLAP-shaped column and table modifiers to the ClickHouse dialect, exposed through a new `Feature\OLAP` marker interface so the methods are reachable only from the dialect's typed Column/Table subclasses — not from `MySQL`, `PostgreSQL`, `SQLite`, or `MongoDB` builders. - `Column::lowCardinality()` wraps the column type in `LowCardinality(...)`. `Nullable` is applied outside to keep ClickHouse's required wrapping order. - `Table::fixedString($name, $length)` (with a Column-chain forwarder) adds a `FixedString(N)` column for fixed-length values like ISO codes and hash digests. - `Column::codec($spec)` accumulates one or more `CODEC(...)` entries on the column. Multiple calls produce `CODEC(c1, c2, ...)`. - `Table::sampleBy($expression)` (with a Column-chain forwarder) registers a `SAMPLE BY` clause emitted between `ORDER BY` and `TTL` / `SETTINGS`. Rejected on engines that don't take an `ORDER BY` clause. State for `isLowCardinality`, `codecs`, and `sampleBy` lives on `Column\ClickHouse` / `Table\ClickHouse`, so non-OLAP dialects don't expose the methods at all and don't carry the state. The `FixedString` `ColumnType` case is only produced via `Table\ClickHouse::fixedString()`; other dialects' `compileColumnType()` declare a defensive `UnsupportedException` branch to satisfy match exhaustiveness even though the case is unreachable from their builders.
…AMPLE BY Adds four sections to the ClickHouse Schema chapter covering the new `Feature\OLAP` modifiers. The narrative makes clear that the methods are dialect-scoped at the type level — calling them on `MySQL`, `PostgreSQL`, `SQLite`, or `MongoDB` builders is a compile-time error, not a runtime throw. Also extends the ClickHouse "Supports the ... interfaces" line to list `Views`, `Databases`, and `OLAP` alongside the existing entries.
a4c51b3 to
825c507
Compare
|
Thanks for the Feature-interface direction — refactored to use
Tests + lint + PHPStan green. Ready for re-review. |
…rom global ColumnType Removes `ColumnType::FixedString` from the cross-dialect enum. FixedString state now lives on `Column\ClickHouse` (via `asFixedString()` / `isFixedString()` / `$fixedStringLength`), and `Schema\ClickHouse::compileColumnType()` reads that state to emit `FixedString(N)` DDL. `Table\ClickHouse::fixedString()` now registers a `ColumnType::String` column and tags it with the FixedString state, so the global enum carries no ClickHouse-only cases and the other dialects (`MySQL`, `PostgreSQL`, `SQLite`, `MongoDB`) no longer need `UnsupportedException` match branches — their `compileColumnType()` methods are byte-identical to `main`. `Feature\OLAP` remains a marker interface matching the dialect-shape pattern (OLAP modifiers live on the column/table builder, not on `Schema`, so they cannot be expressed as a Schema-level method contract); docblock updated to explain why and to confirm the non-OLAP dialects are unchanged by construction. Compiled DDL bytes for ClickHouse are unchanged; all 5175 tests pass; lint and PHPStan max are clean.
|
Sorry, the earlier round wasn't a full fix —
Other three features ( |
The interface declared no methods, was inspected nowhere, and pulled no weight at runtime or in the type system. Every sibling in `Feature/*` declares Statement-returning method signatures, but OLAP modifiers are intrinsic to the column/table builder shape and can't be expressed at the Schema level. Dialect-scoping is fully preserved by `Column\ClickHouse` / `Table\ClickHouse` / `Forwarder\ClickHouse` carrying the modifier methods natively — calling them on a non-ClickHouse builder is a clean type-system error, not a runtime exception.
|
Dropped the empty Dialect-scoping is fully preserved by |
Summary
Adds support for four ClickHouse schema features that are common in
production OLAP workloads but currently can't be expressed via the
schema builder, forcing users to drop down to raw DDL — exactly what
a typed schema builder is meant to prevent.
Each addition lives in
src/Query/Schema/alongside the existingClickHouse modifiers (
ttl,engine,orderBy,settings,skip-index algorithms). Other dialects throw
UnsupportedExceptionat compile time so misuse is caught early.
What's new
LowCardinality(T)column modifierLowCardinalityis a standard ClickHouse storage modifier for stringcolumns with a bounded number of distinct values — status enums, type
discriminators, country/category codes. Dictionary encoding cuts storage
and accelerates reads, and production OLAP schemas without it are an
anti-pattern.
Nullableis applied outsideLowCardinalityto matchClickHouse's required wrapping order.
FixedString(N)column typeFixed-length strings are strictly more efficient than
Stringwhen thebyte length is known and constant — ISO codes, hash digests, fixed-width
identifiers. New
Table::fixedString($name, $length)plus a matchingforwarder on
Column. Length must be at least 1.Column-level
CODEC(...)clausesMultiple
codec()calls accumulate and emitCODEC(c1, c2, ...). Each codec string is emitted verbatim, soarguments live inline (
'Delta(4)','ZSTD(3)') and the modifierstays a thin wrapper around the underlying DDL. Empty strings and
semicolons are rejected at configure time. Column-level codecs are a
core ClickHouse feature for tuning storage size and read throughput;
the schema builder couldn't express them before this PR.
SAMPLE BYtable optionSAMPLE BYenables approximate-query support(
SELECT ... SAMPLE k) and must be declared at table creation time.Emitted after
ORDER BYand beforeTTL/SETTINGS. Rejected onengines that don't take an
ORDER BYclause (Memory,Log,TinyLog,StripeLog).Why these specifically
The schema builder can already model the standard MergeTree shape, but
production ClickHouse schemas almost always reach for one or more of
these modifiers. Without them, users have to fall back to raw DDL,
which defeats the purpose of a typed builder.
The patches follow the same dialect pattern as the existing
ttl,engine,orderBy,settings, and skip-index features added in #6:state lives on
Column/Table, ClickHouse compiles it, and baseSchema/PostgreSQL/SQLiteoverrides throwUnsupportedExceptionso misuse on the wrong dialect is caught atcompile time.
Out of scope (planned follow-ups)
uniqExact,uniq,uniqCombined,uniqHLL12) onBuilder— would let users expressClickHouse-native exact and approximate distinct-count aggregates
without dropping to raw expressions.
toStartOfHour,toStartOfDay,toStartOfWeek,toStartOfMonth,toStartOfMinute) onBuilder— for time-seriesrollups in
SELECTandGROUP BY.These are query-builder features rather than schema features, so a
separate PR keeps this one focused.
Tests
tests/Query/Schema/ClickHouseTest.phpasserting exact DDL output for each feature, plus validation-error
coverage (zero length, empty/semicolon codec, empty/semicolon SAMPLE
BY, SAMPLE BY on a non-
ORDER BYengine).tests/Query/Schema/FluentBuilderTest.phpcoveringLowCardinality,FixedString, column CODEC, andSAMPLE BYon MySQL / PostgreSQL /SQLite.
Test plan
composer test(5197 tests pass)composer lint(Pint passes)composer check(PHPStan max passes)