feat(sandbox): selective per-domain network allowlist (slirp4netns) — #10 step 2#143
Merged
Conversation
TEMPORARY. Adds scripts/sandbox-net-poc.sh + a continue-on-error Linux CI step that exercises the full bwrap --unshare-net + slirp4netns + DNS-proxy allowlist flow on a real kernel, printing diagnostics. This nails down the exact info-fd/ready-fd handshake, host-loopback DNS routing, and port-53 bindability before they are encoded in packages/core/src/sandbox/netns.ts. Both the script and the CI step are removed once the TS orchestrator + integration test land. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The FIFO-based readiness handshake deadlocked in CI (host open(O_WRONLY) blocks forever when the in-sandbox reader and host writer don't share the inode across the bind mount). Replace it with a 3s sleep window inside the sandbox (slirp configures in <1s) plus a 45s background watchdog that hard-kills the sandbox and a trap that always tears down slirp/proxy/tmp. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
bwrap couldn't bind our resolv.conf onto /etc/resolv.conf because on the runner it's a dangling symlink (→ /run/systemd/resolve/stub-resolv.conf, not mounted in the sandbox). Bind our file at the readlink-resolved real path so the preserved symlink leads to it. Add slirp4netns --disable-dns to close the 10.0.2.3 bypass (all resolution must traverse our allowlist). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
slirp4netns failed with setns(CLONE_NEWNET): Operation not permitted — bwrap's net namespace is owned by bwrap's child user namespace, which the host-user slirp process has no CAP_SYS_ADMIN over. Pass --userns-path=/proc/<pid>/ns/user so slirp enters that userns (where it is root) before the netns. resolv.conf bind now confirmed working. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tics --userns-path did not resolve setns(CLONE_NEWNET) EPERM. Dump the namespace topology (child user/net ns links + uid_map/gid_map + lsns) to see whether bwrap nests the userns such that the netns is owned by a parent userns slirp can't gain caps over. Also try the slirp4netns README-exact incantation: bwrap --uid 0 --gid 0 (root-mapped userns) + plain `slirp4netns --configure <pid> tap0` (no --userns-path/--disable-dns). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements the Linux selective network allowlist (the last gap in §3.9a's
sandbox). When sandbox.network.allowedDomains is a non-empty allowlist,
spawnNetworkSandbox (netns.ts) orchestrates:
1. the allowlisting DNS proxy (dns-proxy.ts) on 127.0.0.1:53 — forwards
allowed lookups upstream, returns NXDOMAIN for everything else;
2. bwrap --unshare-net --uid 0 --gid 0 with our resolv.conf bound (at the
symlink-resolved real path) and --info-fd/--block-fd for PID handoff +
readiness gating;
3. slirp4netns --configure --disable-dns attached to bwrap's netns by PID,
giving rootless userspace NAT (tap0, 10.0.2.100/24, gateway 10.0.2.2 →
host loopback where the proxy listens).
The decisive detail: --uid 0 --gid 0 maps the host user to root inside
bwrap's userns, which is what lets slirp (the host user, owner of that
userns) gain CAP_SYS_ADMIN on entry and setns() into the netns — without it
setns(CLONE_NEWNET) is EPERM.
Threat model: DNS-NAME allowlisting (raw-IP dials bypass it) — adequate for
the git/npm/pip-over-https agent workload, and --disable-dns closes the
10.0.2.3 bypass. Requires binding :53 (CAP_NET_BIND_SERVICE or a relaxed
ip_unprivileged_port_start); when unavailable, callers fail CLOSED via
NetworkSandboxUnavailable rather than running unrestricted.
Verified on the Linux CI runner by netns-integration.test.ts (gated on
DC_SANDBOX_NET_TEST + bwrap + slirp4netns): an allowlisted domain returns
HTTP 200 while a non-allowlisted domain fails to resolve. The mechanics were
proven first via a throwaway CI PoC (now removed).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The integration test's assertions passed but the run failed: SIGTERM-ing slirp4netns / bwrap in close() reset their stdio pipes, emitting `read ECONNRESET` with no 'error' listener → vitest flagged 2 unhandled errors. Attach no-op 'error' handlers to both child processes and all their stdio streams so teardown resets are absorbed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Draft / WIP — step 2 of #10 (sandbox network allowlist).
Builds on #142 (real-kernel bwrap CI). Goal: enforce
sandbox.network.allowedDomainson Linux by combining:bwrap --unshare-net— own network namespace (no connectivity by default)slirp4netns— userspace NAT giving the netns rootless outbound connectivitydns-proxy.ts— allowlisting resolver (NXDOMAIN for non-allowed domains), reached by the guest via slirp's host-loopback gatewayCurrent commit is a temporary diagnostic PoC (
scripts/sandbox-net-poc.sh+ acontinue-on-errorLinux CI step) to validate the exactinfo-fd/ready-fdhandshake, DNS routing, and port-53 bindability on a real kernel. The TypeScript orchestrator (netns.ts) + a gated integration test replace the PoC before this leaves draft.🤖 Generated with Claude Code