Skip to content

test: replace testcontainers with embedded-clickhouse across the suite#92

Open
BorisTyshkevich wants to merge 6 commits intofeature/interserver-authfrom
testing/embedded-clickhouse
Open

test: replace testcontainers with embedded-clickhouse across the suite#92
BorisTyshkevich wants to merge 6 commits intofeature/interserver-authfrom
testing/embedded-clickhouse

Conversation

@BorisTyshkevich
Copy link
Copy Markdown
Collaborator

Summary

Replaces every testcontainers-based ClickHouse fixture in this repo with franchb/embedded-clickhouse, which runs ClickHouse as a host subprocess and skips Docker entirely on the happy path.

The motivation is two-pronged:

  • Local dev (macOS): testcontainers + Docker Desktop / OrbStack + Ryuk on an isolator-sandboxed dev machine has been a recurring source of flakiness. The Ryuk reaper can't bind the docker socket through the proxy, the proxy intermittently rejected build/exec endpoints, large build contexts hit chunked-body limits, and the Antalya container takes 5–30s to boot per test.
  • CI (Linux): every test job re-paid the Docker container startup cost. With ~30 tests touching CH, that adds up.

The migration drops local pkg/server runtime from ~50s to 4.2s, and the GitHub Actions test job from a recent-baseline median of 2:44 to 1:34 (43% faster).

Numbers from this branch

CI wall-clock:

testcontainers (median of last 4 main runs) embedded-clickhouse (run 24966839093)
test job 2:44 1:34
build-and-push-platforms (amd64) ~1:14 1:10
build-and-push-platforms (arm64) ~1:25 1:14
create-multiplatform-manifest ~0:15 0:16

Local macOS:

package testcontainers embedded-clickhouse
pkg/server ~50s 4.2s
pkg/clickhouse unchanged baseline 7.5s
cmd/altinity-mcp unchanged baseline 7.9s

Architecture

internal/testutil/embeddedch (new package) is the single source of truth. Setup(t, ...Option) returns a *config.ClickHouseConfig pointing at a host subprocess; EnsureAntalyaBinary(t) extracts the production Antalya binary from the Docker image once and points BinaryPath() at it.

Two flavors:

  • FlavorStock (default): downloads upstream ClickHouse 26.1 from GitHub releases. embedded-clickhouse caches the binary at ~/.cache/embedded-clickhouse/.
  • FlavorAntalya: pulls altinity/clickhouse-server:26.1.6.20001.altinityantalya, extracts /usr/bin/clickhouse via docker create + docker cp into ~/.cache/altinity-mcp/antalya-bin/, and runs that. t.Skip()'s on non-Linux because Antalya only ships Linux binaries.

XML config drop-ins go through WithConfigDropIn(xml) — written to <DataPath>/config.d/N.xml before Start(). ClickHouse's standard auto-merge picks them up. This replaces what testcontainers' Files: []ContainerFile was doing, including:

  • the cluster-secret + interserver users config (pkg/clickhouse/cluster_secret_test.go)
  • the OAuth token_processors + <user_directories> config (pkg/server/oauth_e2e_test.go)
  • the TLS https_port.xml + cert references (cmd/altinity-mcp/main_test.go's connection_with_tls)

WithUsersXML(xml) writes a stub users.xml beside config.xml for tests whose drop-ins reference <user_directories><users_xml><path>users.xml</path></users_xml> — the Antalya container shipped one; embedded-clickhouse does not.

Tests touched

Direct migrations (testcontainers → embedded-clickhouse):

  • pkg/server/server_test.go — all ~35 setupClickHouseContainer(t) callers swapped to setupEmbeddedClickHouse(t). The helper itself moved to embedded_clickhouse_helpers_test.go as a thin wrapper around embeddedch.Setup plus the default.test seeding.
  • pkg/server/oauth_e2e_test.gosetupAntalyaClickHouseWithOIDC rewritten using embeddedch.Setup(WithFlavor(FlavorAntalya), WithConfigDropIn(...), WithUsersXML(...)). Mock OIDC provider is now 127.0.0.1-only (no host.docker.internal plumbing). New newAntalyaOIDCProvider helper serves the full OIDC discovery doc Antalya's token_processors parser requires.
  • pkg/server/server_test.go TestOAuthOpenAPIFullFlow/complete_oauth_openapi_flow — same Antalya migration.
  • pkg/clickhouse/client_test.gosetupClickHouseContainer (~70 lines) → 3-line wrapper around embeddedch.Setup(WithTCPProtocol()).
  • pkg/clickhouse/cluster_secret_test.go — the shell-entrypoint workaround (used to write XML inside the container, dating back to when PUT /containers/.../archive was blocked) replaced with two WithConfigDropIn calls. ~50 lines deleted.
  • cmd/altinity-mcp/main_test.gosetupClickHouseContainerMain shrunk to ~10 lines. The 6 inline testcontainers blocks in TestTestConnection (basic, TCP, TLS, readonly, max_execution_time, plus TestNewApplicationWithTestContainer) collapsed to embeddedch.Setup calls. The TLS test writes cert + key to t.TempDir() and references those host paths in a config.d drop-in (CH reads them on the host, not from inside a container).
  • startContainerWithTiming helper removed (unused).

New regression test:

  • pkg/server/embedded_xml_config_test.go — pins the config.d auto-merge contract end-to-end (custom user authenticates, profile setting applied, server-level setting visible in system.server_settings).

Cross-cutting fixes that landed during the migration

Three issues surfaced only on the GitHub Actions Linux runner during back-to-back runs of this branch; each commit message has the run ID for the failure that triggered it:

  • Cross-process binary download race. go test ./... runs each package as a separate go test binary; embedded-clickhouse v0.4.0 locks within a process via sync.Once but not across processes. The first cold-cache run produced embedded-clickhouse: write binary: unexpected EOF and rename temp file: no such file or directory. Fix: hold an OS-level flock on ~/.cache/embedded-clickhouse/.altinity-mcp-extract.lock around the first stock-CH Start() per process. Lock is uncontended once the binary file exists.
  • Antalya OIDC discovery doc. Antalya's token_processors parser rejected the abridged discovery doc returned by newTestOAuthProvider. Symptom: Cannot extract userinfo_endpoint or introspection_endpoint from OIDC configuration. Fix: new newAntalyaOIDCProvider returning the full doc (issuer, authorization_endpoint, token_endpoint, jwks_uri, userinfo_endpoint, introspection_endpoint, response_types_supported, subject_types_supported, id_token_signing_alg_values_supported). Bound via explicit net.Listen + http.Server.Serve rather than httptest.NewServer for exact Content-Length / chunked framing — Antalya's HTTP client appears sensitive to chunked-vs-fixed framing on small bodies.
  • Antalya <local_directory> permission denied. The original token_processor.xml hardcoded /var/lib/clickhouse/access/, which the Antalya container created automatically but the GitHub runner user can't write. Fix: substitute a t.TempDir() into the XML at setup time.

go.mod

testcontainers-go and docker/docker removed (along with their transitive closure in go.sum). franchb/embedded-clickhouse v0.4.0 added.

What stays on testcontainers

Nothing in this PR. If anyone adds a fixture later that needs a feature embeddedch doesn't yet expose, the helper has clear extension points (WithConfigDropIn, WithUsersXML, WithStartTimeout).

Test plan

  • go vet ./... — clean
  • go test ./... on macOS — green in ~25s, Antalya tests skip with a clear reason
  • CI run 24966839093 — green, test job 1:34, full pipeline including build-and-push and release ~3 min
  • Verified no testcontainers imports remain in any _test.go

🤖 Generated with Claude Code

BorisTyshkevich and others added 6 commits April 26, 2026 21:21
Spike to evaluate replacing testcontainers with franchb/embedded-clickhouse
for tests that don't depend on Antalya-specific server features.

The gating-mode OAuth path validates the bearer locally (HS256 over the
gating secret) and connects to ClickHouse with static credentials, so the
CH side only needs to accept SELECT — no token_processors, no
jwt_validators, no Antalya fork required.

This adds setupEmbeddedClickHouse + TestOAuthGatingViaOpenAPI_Embedded,
which mirrors TestOpenAPIHandlers/combined_auth_oauth_only_via_openapi
from server_test.go using stock ClickHouse 26.1 booted as a host
subprocess. End-to-end pass: 0.88s vs ~22-30s for the testcontainers
equivalent.

Stays scoped to gating-mode tests. The forward-mode OAuth e2e tests
(TestOAuthE2EWithMockOIDC, TestOAuthOpenAPIFullFlow) need Antalya's
token_processors and remain on testcontainers; Antalya is Linux-only so
embedded-clickhouse can't run it as a darwin/arm64 subprocess.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…house

Background: testcontainers + macOS + the agent isolator stack require Ryuk
sidecar bind-mounts, blocked endpoints, and proxy-perf gymnastics that
have repeatedly broken local dev. ClickHouse fits the embedded-postgres
pattern — single self-contained binary, fast subprocess startup — so we
move pkg/server tests off Docker entirely and run CH as a host process
via franchb/embedded-clickhouse.

Helpers (pkg/server/embedded_clickhouse_helpers_test.go):
- setupEmbeddedClickHouse(t, opts...) replaces setupClickHouseContainer.
  Default boots stock CH 26.1, seeds default.test, no Docker. Variadic
  options: withFlavor(flavorAntalya), withConfigDropIn(xml),
  withoutDefaultTable.
- withConfigDropIn writes XML into <DataPath>/config.d/N.xml; ClickHouse's
  standard auto-merge picks it up before startup. Drop-ins replace what
  testcontainers' Files: ContainerFile path was doing — including custom
  users, profiles, server settings, and Antalya's token_processors.
  embedded-clickhouse generates <users> inline in config.xml (no separate
  users.xml), so users.d is ignored — drop-ins always go in config.d.
- ensureAntalyaBinary auto-skips on non-Linux (Antalya only ships Linux
  binaries) and on Linux extracts /usr/bin/clickhouse from
  altinity/clickhouse-server:26.1.6.20001.altinityantalya, caching it
  under $XDG_CACHE_HOME/altinity-mcp/antalya-bin/ for reuse.

Tests rewritten to use the new helpers:
- TestOAuthE2EWithMockOIDC and TestOAuthOpenAPIFullFlow's
  complete_oauth_openapi_flow subtest: now use setupEmbeddedAntalyaWithOIDC
  (Linux-only via ensureAntalyaBinary). Mock OIDC provider switches from
  the docker-reachable 0.0.0.0 helper to plain newTestOAuthProvider —
  CH-as-host-process talks to it on 127.0.0.1, no host.docker.internal
  dance.
- TestOAuthGatingViaOpenAPI_Embedded covers the gating-mode path
  (mirrors combined_auth_oauth_only_via_openapi) — no Antalya needed,
  runs on every platform.
- TestEmbeddedClickHouseXMLDropIn pins the config.d auto-merge contract
  so future regressions surface fast.
- All ~35 setupClickHouseContainer(t) callers in server_test.go switched
  to setupEmbeddedClickHouse(t).

Removed: setupClickHouseContainer, setupAntalyaClickHouseWithOIDC,
newTestOAuthProviderReachableFromDocker, getDockerHostIP — none have
callers left. Dropped testcontainers + docker container imports from
pkg/server tests; pkg/clickhouse and cmd/altinity-mcp keep their
testcontainers helpers (separate scope).

Effect on local dev (macOS):
- pkg/server full suite: ~50s → 4.4s
- Two Antalya subtests skip cleanly via t.Skip with a clear reason
- No Ryuk, no proxy chunked-body limits, no docker.sock binds

Effect on CI (Linux):
- ensureAntalyaBinary pulls + extracts the Antalya binary on first run,
  caches per-tag for the rest of the run
- Both Antalya tests run fully

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Finishes the testcontainers-to-embedded-clickhouse migration started in
220e868 (which covered pkg/server). After this commit no test in the repo
spins up a Docker container for ClickHouse on the happy path; everything
runs as a host subprocess via embedded-clickhouse.

Highlights:

- internal/testutil/embeddedch: shared boot helper (Setup, EnsureAntalyaBinary)
  promoted out of pkg/server's _test.go so pkg/clickhouse and
  cmd/altinity-mcp tests can call it directly. Single source of truth for
  flavor (stock vs Antalya), config.d drop-ins, TCP-vs-HTTP protocol, and
  the Antalya extraction-from-Docker-image path.

- pkg/clickhouse:
  - setupClickHouseContainer (testcontainers, ~70 lines) replaced with a
    3-line setupEmbeddedClickHouse calling embeddedch.Setup with TCP.
  - cluster_secret_test.go's setupClusterSecretClickHouse — which previously
    used a shell entrypoint to write XML drop-ins inside the container as a
    workaround for the blocked PUT /containers/.../archive endpoint — now
    just passes the same XML through WithConfigDropIn. ~50 lines deleted.

- cmd/altinity-mcp:
  - setupClickHouseContainerMain dropped to ~10 lines.
  - All 6 inline testcontainers blocks in TestTestConnection
    (successful_connection, connection_with_tls, connection_with_tcp_protocol,
    connection_with_readonly_mode, connection_with_max_execution_time,
    plus TestNewApplicationWithTestContainer) collapsed to embeddedch.Setup
    calls. The TLS subtest writes cert + key to a TempDir and references
    those host paths in a config.d drop-in, since CH now reads them on the
    host rather than from inside a container.
  - startContainerWithTiming helper removed (unused).

- go.mod: testcontainers-go and docker/docker dependencies removed
  (transitively, alongside dozens of testcontainer-only deps in go.sum).

- .gitignore: ignore the locally-built jwe-token-generator binary.

Local test wall-clock on macOS:
- pkg/server  ~50s -> 4.2s (committed in 220e868; same here)
- pkg/clickhouse  ~?  -> 7.5s
- cmd/altinity-mcp  ~?  -> 7.9s
- Total module: ~25s

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Antalya's token_processor.xml drop-in declares <user_directories> with
<users_xml><path>users.xml</path></users_xml>. The Antalya Docker image
ships /etc/clickhouse-server/users.xml; our embedded-clickhouse setup
puts everything in a temp dir without a users.xml, so on Linux CI the
server failed startup with CANNOT_LOAD_CONFIG.

Add embeddedch.WithUsersXML(content) to write the file beside
the generated config.xml, and use it from setupEmbeddedAntalyaWithOIDC
with a minimal users.xml that mirrors embedded-clickhouse's inline
<users>/<profiles>/<quotas> defaults. The OIDC users we actually care
about come from the token user_directory at runtime; users.xml just
needs to exist for the path to resolve.

Reproduced on Linux CI run 24966501158
(test job: 2:24, FAIL on TestOAuthE2EWithMockOIDC +
TestOAuthOpenAPIFullFlow/complete_oauth_openapi_flow with
"embedded-clickhouse: server did not become ready: context deadline
exceeded", caused by CANNOT_LOAD_CONFIG looking up users.xml).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…race

Two CI-only failures from run 24966599770 (run 24966501158 had #1 only):

1. Antalya rejected the mock OIDC provider with
   "Cannot extract userinfo_endpoint or introspection_endpoint from OIDC
   configuration". newTestOAuthProvider returns a stripped-down discovery
   doc (only issuer + jwks_uri + userinfo_endpoint) that satisfies forward-
   mode bearer-passthrough tests but not Antalya's token_processors parser,
   which requires authorization_endpoint, token_endpoint, introspection_
   endpoint, response_types_supported, subject_types_supported,
   id_token_signing_alg_values_supported. Add newAntalyaOIDCProvider with
   the full discovery doc; use it from TestOAuthE2EWithMockOIDC.

2. embedded-clickhouse v0.4.0's cache layer locks within a process via
   sync.Once, but `go test ./...` runs each package as a separate binary,
   and multiple processes raced to download + extract the same archive
   into ~/.cache/embedded-clickhouse. Symptoms:

     embedded-clickhouse: write binary: unexpected EOF
     embedded-clickhouse: rename temp file: ... no such file or directory

   Hold an OS-level flock around the first stock-CH extract. The lock is
   only contended on cold cache; once the binary file exists, subsequent
   processes skip the extract entirely.

   Antalya path keeps using docker pull + docker cp which is single-process
   in our setup (only pkg/server runs Antalya tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two more Antalya CI failures from run 24966723959 (test 2:41, FAIL):

1. /var/lib/clickhouse/access/: Permission denied. The token_processors
   user_directories config hardcoded the Antalya container's data dir,
   which doesn't exist on the GitHub runner host. Substitute a t.TempDir()
   the runner user can create + write.

2. TestOAuthOpenAPIFullFlow/complete_oauth_openapi_flow was still calling
   newTestOAuthProvider (the abridged one). Switch to newAntalyaOIDCProvider
   which serves the full OIDC discovery document Antalya requires.

Also rewrote newAntalyaOIDCProvider to use net.Listen + http.Server.Serve
(matching the original docker-reachable provider's exact framing) instead
of httptest.NewServer, since Antalya's HTTP client appears sensitive to
chunked-vs-fixed Content-Length on small bodies — symptom was
"Cannot extract userinfo_endpoint or introspection_endpoint" even though
both fields were present in the body.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant