Experimental
A terminal-first cloud workstation with a SvelteKit frontend and a Cloudflare Worker + Containers backend.
Cloudshell gives each user one shared workstation filesystem and lets them open isolated runtime sessions on top of it.
The core model is:
user workstation= one shared filesystem per usersession= one isolated runtime containertab= one independent shell inside the active session
- SvelteKit app for auth, routing, and UI
- Better Auth backed by D1
- Cloudflare Worker backend for session orchestration
- Cloudflare Containers for isolated runtime sessions
- Shared per-user workstation files across sessions
- Per-tab shell state inside each session
- Left-side files drawer wired to the same filesystem the terminal sees
- Ports and tools workspace for forwarding, sharing, SSH keys, and backups
- npm-bundled terminal stack with no CDN dependency
src/ SvelteKit app
worker/ Cloudflare Worker + container runtime
shared/ Shared helpers used by app and worker
migrations/ Better Auth / D1 migrations
old/ Archived pre-rebuild app for reference
alchemy.run.ts Dev + deploy orchestration
- Create a local env file.
cp .env.example .env- Fill in the required values:
ALCHEMY_PASSWORDBETTER_AUTH_SECRET
Optional:
BETTER_AUTH_URLBETTER_AUTH_TRUSTED_ORIGINSTERMINAL_TICKET_SECRETAWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYR2_ACCOUNT_IDCLOUDFLARE_ACCOUNT_ID
- Start the app.
bun install
bun run devDefault local URLs:
- app:
http://localhost:5173 - worker:
http://localhost:1338
bun run dev
bun run build
bun run check
bun run test
bun run deploy
bun run destroyDatabase helpers:
bun run db:generate
bun run db:migrate
bun run db:local
bun run db:remote- Better Auth uses D1 for user and auth records.
- Runtime metadata stays in KV/R2, not D1.
- Terminal access uses app-origin websocket routing in production and a signed direct-worker ticket in local dev.
@xterm/xtermand@xterm/addon-fitare bundled from npm. There is no CDN-loaded terminal dependency.
The browser does not connect to the container directly.
src/routes/api/cloudshell/terminal-connection/+server.tsissues a short-lived ticket for the active user, session, and tab.src/routes/ws/terminal/+server.tsproxies the websocket throughsrc/lib/server/worker.ts.- The app-host proxy validates the ticket and forwards the terminal identity headers to the worker hop:
X-User-IdX-User-EmailX-Session-IdX-Tab-Id
worker/effect/services.tsforwards the same identity context into the container request.worker/container/main.gouses that user/session/tab identity to attach the correct terminal process.
If any hop drops that identity, the browser typically sees a websocket close like 1006 instead of a usable terminal.
Cloudshell is agnostic to what MCP server you wire up — it is the broker, not the authority. The CLI inside the container never holds an upstream OAuth token; the cloudshell app holds them server-side in a per-user Durable Object and attaches them to outbound calls.
browser (your device) cloudshell.coey.dev upstream MCP server
│ │ │
│ Better Auth session │ │
│─── POST /oauth/mcp/start ──▶│ │
│ (click Connect in UI │ runAuthStart in UserAgent DO: │
│ or mcp login in shell) │ PKCE, Dynamic Client │
│ │ Registration, state nonce │
│ │─── authorize URL ─────────────▶│
│◀── popup window ─────────────│ │
│ │
│ ─── user approves in popup, upstream redirects ──────────────▶│
│ │
│◀── GET /api/mcp/oauth/callback?state=&code= ──────────────────│
│ │ runAuthCallback in DO: │
│ │ exchange code, save tokens │
│◀── popup posts result ───────│ │
│ │
│ container shell │
│ $ mcp call <server> tools/list │
│ │ │ │
│ │── POST /api/mcp/bridge ─▶│ │
│ │ X-Cloudshell-Ticket │ verifyBridgeTicket │
│ │ │ DO.getAccessTokenFor() │
│ │ │─── Authorization: Bearer ─────▶│
│ │ │◀── tool result (JSON / SSE) ───│
│ │◀───────── result ────────│ │
Key properties, each tied to source:
- Token never enters the container.
worker/mcp-bridge.tsbridgeMcpRequest()reads the bearer from the DO by RPC, attaches it to the outbound fetch, streams the response back. The container only sees the MCP JSON-RPC response body. - Auth separation. Browser path uses Better Auth session cookies
(
src/routes/api/mcp/oauth/*). Container path uses a short-lived capability ticket scoped['mcp-bridge'](shared/terminal-ticket.ts,verifyCapabilityTicket). Terminal identity tickets (no scope) can't be substituted — every path has an exact-scope check. - Bridge tickets auto-delivered. When the browser opens a terminal
tab,
src/lib/components/cloudshell/terminal-pane.sveltedeliverBridgeTicket()mints a ticket against/api/cloudshell/ ticket/mint-bridgeand sends it down the terminal WebSocket as a{type: "bridge_ticket"}control frame.worker/container/main.gowrites it to~/.cloudshell/bridge-ticketon receive;worker/container/mcp-cli.mjsreads it on invocation. - Per-user isolation. Each user has one
CloudshellUserAgentDurable Object (worker/user-agent.ts) addressed by Better Auth user ID. All OAuth state (client info, code verifiers, state nonces, tokens) lives in that DO's SQLite. Deleting the user wipes everything; two users cannot see each other's connections. - Based on
cloudflare/agents. The token store isDurableObjectOAuthClientProviderfrom the agents SDK, driven by@modelcontextprotocol/sdk/client/auth.js. Cloudshell writes a thin per-user DO around it, plus the HTTP + UI plumbing above.
Usage from the shell:
# one-time: open a terminal in the browser. Bridge ticket lands
# at ~/.cloudshell/bridge-ticket automatically.
$ mcp login https://mcp.apify.com
Open this URL in your browser to authorize https://mcp.apify.com:
https://console.apify.com/authorize/oauth?...
Waiting for authorization... (Ctrl-C to cancel)
✓ Connected to https://mcp.apify.com.
$ mcp list
https://mcp.apify.com 2026-04-19T...
$ mcp call https://mcp.apify.com tools/list
{ "jsonrpc": "2.0", ..., "result": { "tools": [...] } }Usage from the UI: visit the Tools panel, enter an MCP server URL in "Connect MCP server", approve in the popup. Connection appears in the list and is immediately usable from any terminal.
Design notes and honest trade-offs (see also
~/cloudflare/auth-research/experiments/28-cloudshell-mcp-broker/):
- Tokens in DO SQLite are plaintext today, not end-to-end encrypted.
cloudflare/workers-oauth-provider(the OAuth server library) encryptspropsat rest; we could port that pattern to the client side if a threat model requires it. parseServerIdFromStateinworker/mcp-oauth.tsparses the state nonce formatnonce.serverId. This matches agents SDK 0.11'sDurableObjectOAuthClientProvider.state()output; if that format changes we break silently. A more robust fix would ask the provider for the serverId directly.- Container
sleepAfteris currently 5 min. Bridge tickets are 1 hour. Bridge ticket lifetime dominates; loss of a container doesn't invalidate the ticket. Tighten if the threat model needs it.
Short version: this was an alchemy bug, fixed upstream on 2026-01-09 in PR #1298. If you're on
alchemy >= 0.78, skip to the next section. If you're pinned below that, either upgrade or use the manual-delete workaround below.Root cause: Cloudflare's
POST /accounts/:id/containers/applicationsendpoint used to returnerrors[].code === 1000when a Durable Object already had a container app bound to it. That code is now1608. Alchemy0.77.5and earlier looked for1000, so the adopt-on-conflict recovery branch increateContainerApplicationnever fired. Every subsequent deploy threwFailed to create container application: ... DURABLE_OBJECT_ALREADY_HAS_APPLICATIONand required a manualDELETE /containers/applications/<id>before it could proceed. Alchemy0.78+matches on1608and cleanly callsupdateContainerApplicationon the existing record — deploys adopt the existing app in place, preserving its ID and DO bindings.How to tell you're hitting it: every deploy, not just drift. Error message on the deploy step is
DURABLE_OBJECT_ALREADY_HAS_APPLICATION. The container app ID stays the same across manual deletes (the name is deterministic, derived from project + resource ID).
If the terminal returns 1006 / "Terminal unavailable" indefinitely after all five
hops look healthy, check the account's container applications inventory before
spending more time in the code. This has bitten this project for a long time.
The symptom:
- Worker log shows
Error checking 8080: The operation was abortedrepeating - Runtime reports
"Container crashed while checking for ports, did you start the container and setup the entrypoint correctly?" wrangler tailshowsContainer state running attempt=1/attempt=2butwaitForPortnever resolves- A minimal parity container deployed alongside (see
worker/parity-container/) WORKS fine on the exact same Worker → DO path
The cause: When the project has been redeployed across breaking changes — alchemy renames, DO class renames, package upgrades — the Cloudflare Containers control plane can end up with stale container applications mapped to the same Durable Object class as the live one. Symptoms on the inventory:
- Multiple
cloudshell-*apps for the same logical container - One or more apps reporting
instances > max_instances(e.g.9/5) - Names from a prior code shape (e.g.
cloudshell-cloudshellterminalwhen the current code usesContainer('sandbox', ...)which alchemy namescloudshell-sandbox-runner)
When this state exists, Docker pushes on new deploys succeed, but the DO class is
still routing to a wedged old app. Changes to Dockerfile, startup.sh, Worker
code, or the @cloudflare/containers package version do not take effect because
the runtime is serving a stale image behind the scenes.
Check and clean up:
# List all container apps on the account
curl -sS -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
"https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/containers/applications" \
| jq '.result[] | select(.name | startswith("cloudshell"))
| {name, id, instances, max: .max_instances}'If you see duplicates, orphans, or instances > max, delete them:
curl -sS -X DELETE -H "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
"https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/containers/applications/<ID>"Then redeploy — alchemy recreates the active app cleanly and the new image takes effect immediately.
Do not delete cloudshell-sandbox-runner if it is the only healthy app and
your terminal works. Only the orphans and the stuck entries.
Having this as a documented pattern saves the next person the four-commit chase we did the first time:
- Reproduced 1006 in prod. WS hung in
readyState: 0 (CONNECTING)for 10+ seconds, never opened. - Bisected with existing probe endpoints.
/ws/hello,/ws/terminal-probe,/ws/proxy-helloall opened and echoed cleanly — confirming Browser→SvelteKit and SvelteKit→Worker hops were fine. Only/ws/terminalfailed. - Enabled
wrangler tail. Revealed the Worker-side failure:waitForPort(8080)on the container DO kept aborting;Container crashed while checking for ports. - Shipped three plausible fixes. (a)
containerFetch(request, port)override missing on the terminal Container subclass (was genuinely required post-GA); (b)set -einstartup.shcan abort beforeexec /server; (c) upgrade@cloudflare/containers 0.1.1 → ^0.3.2+ explicitenableInternet = true. All correct changes. None moved the tail output. Identical failure pattern after each deploy. - Enabled the parity container bisect. Set
TERMINAL_PARITY_SECRETin CI soCloudShellParityTerminal(9-line Dockerfile, node +ws, no FUSE, no Go) provisioned. Hitwss://cloudshell-api.coey.dev/terminal?secret=.... It opened cleanly and responded withparity-readyin <400ms — tail showedPort 8080 is ready. That conclusively proved the Worker→DO→container path was healthy; the problem was inside the cloudshell image path. - Disabled FUSE entirely in
startup.sh. The only meaningful difference between cloudshell and parity was FUSE + bash + Go. Still same 1006. At this point nothing in the source tree could explain why code changes had no effect on runtime behavior. - Inspected the account-level container apps inventory. Found three
cloudshell-named apps, two with
instances > max_instances(9/5 and 9/5), one with a pre-refactor name (cloudshell-cloudshellterminal) that no longer matches anything in the currentalchemy.run.ts. That explained why our committed code changes were being ignored: the DO class was still serving traffic from a wedged historical container app that Alchemy's state didn't know about. - Deleted the stale apps, redeployed. Alchemy recreated the active app clean, the new image took effect, and the terminal came back up.
The tell for next time: if code changes make it through CI and deploy, but the behavior in tail is byte-identical to before the change, suspect container state before suspecting code.
Before shipping changes:
bun run check
bun run test
bun run buildFor worker container tests:
cd worker/container && go test ./...