Skip to content

feat(clickhouse): add UUID, Decimal, Array/Tuple, UInt8/Int8, raw ORDER BY, rawColumn passthrough#10

Open
lohanidamodar wants to merge 2 commits intomainfrom
feat/clickhouse-schema-extras-2
Open

feat(clickhouse): add UUID, Decimal, Array/Tuple, UInt8/Int8, raw ORDER BY, rawColumn passthrough#10
lohanidamodar wants to merge 2 commits intomainfrom
feat/clickhouse-schema-extras-2

Conversation

@lohanidamodar
Copy link
Copy Markdown
Contributor

Summary

Follow-up to #8 — adds the remaining ClickHouse schema features commonly needed in production OLAP workloads, plus a small compiler fix. Base-level features (uuid(), decimal(), tinyInteger(), smallInteger(), defaultRaw()) also map cleanly across MySQL, PostgreSQL, SQLite, and MongoDB.

What's new

UInt8 / Int8 via tinyInteger() and UInt16 / Int16 via smallInteger()

Small integer columns are a natural fit for bounded enumerations, percentage values, and other fields whose value range fits well below 32 bits. Storing them as UInt8 saves 75% of the disk and memory footprint compared to the default UInt32 produced by integer()->unsigned(). ClickHouse emits UInt8/Int8 and UInt16/Int16; MySQL maps to TINYINT/SMALLINT; PostgreSQL to SMALLINT (no TINYINT); SQLite to INTEGER.

$schema->table('events')
    ->bigInteger('id')->primary()
    ->tinyInteger('scroll_depth')->unsigned()
    ->smallInteger('year_offset')
    ->create();

Array(T) and Tuple(...) column types

Array(T) is the canonical ClickHouse type for multi-valued attributes — tags, labels, key/value pairs flattened into parallel arrays — and is the standard way to model nested records in the MergeTree family. Tuple(...) covers fixed-arity composites like geo points and key/value pairs.

use Utopia\Query\Schema\ColumnType;

$schema->table('events')
    ->bigInteger('id')->primary()
    ->array('meta.key', ColumnType::String)
    ->array('meta.value', ColumnType::String)
    ->array('user_ids', ColumnType::BigInteger)->unsigned()
    ->tuple('coords', [ColumnType::Float, ColumnType::Float])
    ->create();

Element types run back through the standard column-type compiler so the parent column's unsigned() and precision flags carry through to the inner type. Nullable(...) wraps the whole Array/Tuple; LowCardinality(...) is rejected on these columns because ClickHouse only permits it on scalar types. ClickHouse-only — calling ->array() or ->tuple() on a different dialect's builder fails at the type level.

decimal(precision, scale)

Fixed-point numeric column type for monetary or precision-sensitive values where binary-floating-point error is unacceptable. ClickHouse emits Decimal(P, S); MySQL/PostgreSQL emit DECIMAL(P, S); SQLite emits NUMERIC(P, S); MongoDB maps to the decimal BSON type. Combines with nullable() exactly as scalar columns do.

$schema->table('orders')
    ->bigInteger('id')->primary()
    ->decimal('amount', precision: 18, scale: 3)
    ->decimal('rate', precision: 5, scale: 4)->nullable()
    ->create();

UUID column type with defaultRaw()

UUIDs are first-class fixed-width identifier types in ClickHouse and PostgreSQL and a 36-character string elsewhere; production schemas commonly use them as primary identifiers with server-generated defaults. Column::defaultRaw(string) emits the expression verbatim after DEFAULT — distinct from default(), which quotes string literals — so callers can attach generateUUIDv4(), gen_random_uuid(), UUID(), now(), CURRENT_TIMESTAMP, and similar dialect-specific server-generated defaults.

$schema->table('events')
    ->uuid('event_id')->defaultRaw('generateUUIDv4()')->primary()
    ->datetime('ts', 3)
    ->create();

uuid() compiles to UUID on ClickHouse and PostgreSQL, CHAR(36) on MySQL, TEXT on SQLite, and the string BSON type on MongoDB. defaultRaw() is on the base Column, so it works on every dialect; it takes precedence over default() when both are set, and rejects empty strings and semicolons.

Raw expressions in ORDER BY

MergeTree ORDER BY clauses routinely include scalar function calls — toDate(ts), cityHash64(...), intHash32(user_id) — to control sparse-index cardinality. orderBy(array) restricts each entry to a plain identifier; orderByRaw(string) accepts the full parenthesised tuple verbatim, mirroring the existing partitionBy(string) convention.

$schema->table('events')
    ->string('tenant')
    ->bigInteger('id')
    ->datetime('ts')
    ->orderByRaw('(`tenant`, toDate(`ts`), `id`)')
    ->create();

Takes precedence over orderBy() when both are set; rejects empty strings and semicolons. ClickHouse-only.

rawColumn() passthrough fix on ClickHouse

Table::rawColumn(string $definition) is the documented escape hatch for column types the typed builder does not yet model. The base Schema::compileCreate() already iterates $table->rawColumnDefs, but the Schema\ClickHouse::compileCreate() override loop did not — so raw fragments registered through the same fluent builder silently disappeared from the generated DDL on ClickHouse only. The fix mirrors the loop in the ClickHouse override (one for-loop).

Out of scope (planned follow-up)

  • Bulk insert formats on Builder\ClickHouse (FORMAT JSONEachRow, RowBinary, TabSeparated, Parquet) — broader surface that touches the builder rather than the schema compiler; deserves its own PR.

Tests

38 new assertions across:

  • ClickHouseTestuuid() with and without defaultRaw(), nullable wrapping, defaultRaw() precedence and validation, tinyInteger()/smallInteger() (signed and unsigned), decimal() with nullable(), array(T) with String/UInt64/nullable wrapping, LowCardinality rejection on Array, tuple() with empty-list validation, orderByRaw() with mixed function calls, orderByRaw() precedence and validation, rawColumn() passthrough through compileCreate().
  • MySQLTest, PostgreSQLTest, SQLiteTesttinyInteger/smallInteger/decimal/uuid cross-dialect mappings; defaultRaw() rendered correctly alongside NOT NULL/PRIMARY KEY; decimal() precision/scale validation.
  • MongoDBTestdecimal/tinyInteger/uuid BSON type mappings.

All gates green: composer test, composer lint, composer check (PHPStan level max).

`rawColumn()` is the documented escape hatch for emitting dialect-specific
column types the typed builder does not yet model. The base
`Schema::compileCreate()` already iterates `$table->rawColumnDefs`, but the
ClickHouse override loop did not — so raw fragments registered through the
same fluent builder silently disappeared from the generated DDL on
ClickHouse only. Mirror the loop in `Schema\ClickHouse::compileCreate()`.
…w(), plus ClickHouse Array/Tuple and raw ORDER BY

Adds the remaining production-OLAP-shaped schema features that callers
had to drop to `rawColumn()` for after the 0.3.x bump:

- `Table::uuid()` — UUID column type, native on ClickHouse (`UUID`) and
  PostgreSQL (`UUID`); `CHAR(36)` on MySQL; `TEXT` on SQLite; `string`
  BSON type on MongoDB. Server-generated UUIDs are common as primary
  identifiers and need a dialect-specific default expression rather
  than an application-supplied value.

- `Column::defaultRaw(string)` — raw default expression emitted
  verbatim after `DEFAULT`. Lets callers attach `generateUUIDv4()`,
  `gen_random_uuid()`, `UUID()`, `now()`, `CURRENT_TIMESTAMP`, etc.
  without the quoting `default()` applies to scalar values. Takes
  precedence over `default()` when both are set; rejects empty strings
  and semicolons.

- `Table::tinyInteger()` and `Table::smallInteger()` — small integer
  column types. On ClickHouse they map to `UInt8`/`Int8` and
  `UInt16`/`Int16` (75% smaller than the default `UInt32` produced by
  `integer()->unsigned()`), to native `TINYINT`/`SMALLINT` on MySQL,
  to `SMALLINT` on PostgreSQL (which has no `TINYINT`), and to
  `INTEGER` on SQLite. Useful for bounded enumerations, percentage
  values, and other fields that fit well under 32 bits.

- `Table::decimal(name, precision, scale)` — fixed-point numeric
  column for monetary and precision-sensitive values where
  binary-floating-point error is unacceptable. ClickHouse emits
  `Decimal(P, S)`; MySQL/PostgreSQL emit `DECIMAL(P, S)`; SQLite
  emits `NUMERIC(P, S)`; MongoDB maps to the `decimal` BSON type.
  Rejects negative scale and scale greater than precision.

- `Table\ClickHouse::array(name, ColumnType $element)` and
  `Table\ClickHouse::tuple(name, list<ColumnType>)` — `Array(T)` and
  `Tuple(...)` nested column types. Core ClickHouse types for
  multi-valued attributes (tags, labels, parallel-array nested
  records) and fixed-arity composites (geo points, key/value pairs).
  Element types run back through the standard column-type compiler so
  `unsigned()` and `precision`/`scale` flags carry into the inner
  type. `Nullable(...)` wraps the whole `Array`/`Tuple`;
  `LowCardinality(...)` is rejected on these columns to match
  ClickHouse's documented constraints.

- `Table\ClickHouse::orderByRaw(string)` — raw `ORDER BY` expression
  emitted verbatim. MergeTree `ORDER BY` clauses routinely include
  scalar function calls (`toDate(ts)`, `cityHash64(...)`,
  `intHash32(user_id)`) to control sparse-index cardinality; the
  existing identifier-only `orderBy(array)` blocks this common shape.
  Mirrors the `partitionBy(string)` convention. Takes precedence over
  `orderBy()` when both are set; rejects empty strings and semicolons.

README updated under "Creating Tables" (new types and modifiers) and
"ClickHouse Schema" (per-feature subsections with generated DDL).

`Column::$scale` is added alongside the existing `$precision`/`$length`
constructor args, and dialect `Table::newColumn()` overrides forward
it through.
@github-actions
Copy link
Copy Markdown

📊 Coverage

Metric PR Baseline Δ
Lines 91.70% (7251/7907) 91.85% -0.15%
Methods 84.41% (1083/1283) 84.56% -0.15%
Classes 65.35% (132/202) 65.84% -0.50%

Full per-file breakdown in the job summary.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 11, 2026

Greptile Summary

This PR extends the schema builder with six new features: uuid(), decimal(), tinyInteger(), smallInteger(), defaultRaw(), Array(T)/Tuple(...) column types, raw ORDER BY expressions, and a fix for the missing rawColumn() loop in ClickHouse::compileCreate(). Cross-dialect mappings (MySQL, PostgreSQL, SQLite, MongoDB) are consistent and the new defaultRaw() / orderByRaw() paths apply appropriate validation.

  • Nullable(Array(T)) is emitted without error, but ClickHouse rejects this DDL at the server level — the same UnsupportedException guard used for LowCardinality on arrays should apply here.
  • compileNestedElementType uses a single parent column's unsigned, precision, and scale flags for every element inside a Tuple(...), making mixed-signedness or mixed-precision tuples (e.g. Tuple(Decimal(18,3), Decimal(5,4))) inexpressible without falling back to rawColumn().
  • ClickHouse caps Decimal precision at 76; the decimal() builder validates only the lower bound, so values above 76 pass compilation but fail at the database.

Confidence Score: 3/5

The Array nullable path silently generates DDL that ClickHouse will reject at runtime; the remaining new features are correct across all dialects.

The ->array(...)->nullable() combination produces Nullable(Array(T)), which ClickHouse's type system forbids — any production schema using this combination will fail when the DDL is executed. The rawColumn() fix, uuid(), decimal(), tinyInteger()/smallInteger(), defaultRaw(), and orderByRaw() all look correct, and the cross-dialect mappings are consistent. The Tuple element-type sharing limitation and the uncapped Decimal precision are design choices worth revisiting before the API stabilises.

src/Query/Schema/ClickHouse.php — the Array nullable branch and the nested element type compiler; tests/Query/Schema/ClickHouseTest.php — the testCreateTableArrayNullable assertion needs to be updated alongside the fix.

Important Files Changed

Filename Overview
src/Query/Schema/ClickHouse.php Adds Array/Tuple/UUID/Decimal/TinyInteger/SmallInteger type compilation and rawColumn fix; generates invalid Nullable(Array(T)) DDL that ClickHouse rejects, and Tuple element types all share the parent column's unsigned/precision flags.
src/Query/Schema/Column.php Adds defaultRaw() method with empty-string and semicolon validation, scale parameter, and fluent forwarding methods for the new column types; logic is sound.
src/Query/Schema/Table.php Adds tinyInteger(), smallInteger(), decimal(), and uuid() to the base table builder; decimal() validates precision/scale but lacks an upper-bound guard for ClickHouse's max precision of 76.
src/Query/Schema/Table/ClickHouse.php Adds array(), tuple(), orderByRaw() methods with proper validation (empty string, semicolon); orderByRaw correctly takes precedence over orderBy().
src/Query/Schema/Column/ClickHouse.php Adds arrayElementType/tupleElementTypes properties and asArray()/asTuple() builder methods; asTuple correctly validates non-empty element list.
src/Query/Schema/ColumnType.php Adds TinyInteger, SmallInteger, Decimal, Uuid, Array, and Tuple enum cases; straightforward, no issues.
src/Query/Schema/MySQL.php Maps new types to correct MySQL equivalents (TINYINT, SMALLINT, DECIMAL(P,S), CHAR(36)); throws for Array/Tuple.
src/Query/Schema/PostgreSQL.php Maps TinyInteger/SmallInteger to SMALLINT, Decimal to DECIMAL(P,S), Uuid to native UUID; adds defaultRaw support; correct.
src/Query/Schema/SQLite.php Maps new types to SQLite equivalents (INTEGER, NUMERIC(P,S), TEXT); adds defaultRaw support; correct.
src/Query/Schema/MongoDB.php Adds BSON type mappings for Uuidstring, Decimaldecimal, TinyIntegerint, Array/Tuplearray; correct.
tests/Query/Schema/ClickHouseTest.php 38 new assertions covering the new features; testCreateTableArrayNullable asserts Nullable(Array(String)) output, which is invalid DDL on ClickHouse — this test should be updated to expect an UnsupportedException.

Comments Outside Diff (2)

  1. src/Query/Schema/Table.php, line 739-757 (link)

    P2 Missing upper-bound validation for ClickHouse Decimal precision

    ClickHouse only accepts Decimal(P, S) where P is in [1, 76]; precision values above 76 are silently accepted by the library but cause a server-side error. The current guards (precision < 1, scale < 0, scale > precision) don't catch this. Since the other dialects (MySQL, PostgreSQL, SQLite) have much higher or no practical limits, a ClickHouse-specific check here would require coupling — acceptable alternatives include a general cap at 76 (the tightest dialect) or deferring the guard to ClickHouse::compileColumnType().

  2. src/Query/Schema/ClickHouse.php, line 271-298 (link)

    P2 compileNestedElementType shares the parent column's precision, scale, and isUnsigned across all Tuple elements

    All element types in a Tuple(...) column resolve their signedness and precision from the single parent Column object. A Tuple(Decimal(18,3), Decimal(5,4)) cannot be expressed (both Decimals will use the same $parent->precision/$parent->scale), and a mixed-signedness tuple like Tuple(Int8, UInt64) is equally impossible. Callers who hit this ceiling must fall back to rawColumn(), which is not mentioned in the new docs. The limitation is not surfaced at the API level, so the builder silently produces a different type than the caller intended.

Reviews (1): Last reviewed commit: "feat: add UUID, Decimal, tinyInteger/sma..." | Re-trigger Greptile

Comment on lines +49 to +62
if ($column instanceof Column\ClickHouse && $column->arrayElementType !== null) {
if ($column->isLowCardinality) {
throw new UnsupportedException('LowCardinality is not supported inside Array(...). Wrap the element type instead.');
}

$inner = $this->compileNestedElementType($column->arrayElementType, $column);
$type = 'Array(' . $inner . ')';

if ($column->isNullable) {
$type = 'Nullable(' . $type . ')';
}

return $type;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 ClickHouse explicitly forbids wrapping Array with Nullable — the DDL Nullable(Array(T)) is rejected at the server level. The pattern mirrors the existing LowCardinality guard two lines above; the fix is to throw UnsupportedException there instead of silently emitting invalid DDL. The test testCreateTableArrayNullable passes today only because it checks the generated string, not whether ClickHouse accepts it.

Suggested change
if ($column instanceof Column\ClickHouse && $column->arrayElementType !== null) {
if ($column->isLowCardinality) {
throw new UnsupportedException('LowCardinality is not supported inside Array(...). Wrap the element type instead.');
}
$inner = $this->compileNestedElementType($column->arrayElementType, $column);
$type = 'Array(' . $inner . ')';
if ($column->isNullable) {
$type = 'Nullable(' . $type . ')';
}
return $type;
}
if ($column instanceof Column\ClickHouse && $column->arrayElementType !== null) {
if ($column->isLowCardinality) {
throw new UnsupportedException('LowCardinality is not supported inside Array(...). Wrap the element type instead.');
}
if ($column->isNullable) {
throw new UnsupportedException('Nullable(Array(...)) is not supported in ClickHouse. Use an empty array [] to represent a missing value instead.');
}
$inner = $this->compileNestedElementType($column->arrayElementType, $column);
$type = 'Array(' . $inner . ')';
return $type;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant