Skip to content

WIP: Port the init binary code to Rust#670

Draft
jakecorrenti wants to merge 27 commits into
containers:mainfrom
jakecorrenti:port-init
Draft

WIP: Port the init binary code to Rust#670
jakecorrenti wants to merge 27 commits into
containers:mainfrom
jakecorrenti:port-init

Conversation

@jakecorrenti
Copy link
Copy Markdown
Member

This PR ports the init binary code to Rust. It acts like any of the other crates that we have within the project.

To run the examples or with Podman, you would build the project as usual: make BLK=1 NET=1 && sudo make BLK=1 NET=1 install and continue with business as usual.

Fixes: #632

Comment thread src/init-blob/build.rs
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this would also be a good opportunity to move this build.rs away from devices crate too.

Not sure what should it be called, maybe init-blob? I'm thinking it should literally be a crate that has 1 public constant (which is the init binary) and this build.rs.

For now devices crate can depend on this init-blob as usual, but I plan to change that. I may end up stacking multiple PRs on top of this which (depending on how long it will take to merge this), which need this to be a separate crate1, so it would really simplify the rebases for me.

Footnotes

  1. I want to make the VMM crate depend on this init-blob and not the fs device itself (fs device will just receive a list of virtual files in constructor) this is in preparation for the 2.0 Rust API.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like #593 ;-)

@jakecorrenti jakecorrenti force-pushed the port-init branch 17 times, most recently from 158d388 to 1719f2f Compare May 8, 2026 21:50
mtjhrc added 10 commits May 11, 2026 17:48
Move the init binary build script and include_bytes!() from the
devices crate into a new init-blob crate. The passthrough modules
reference the binary as init_blob::INIT_BINARY instead of using
include_bytes! directly.

build.rs based on code from containers#593.
Co-authored-by: Geoffrey Goodman <geoff@goodman.dev>

Assisted-by: OpenCode:claude-opus-4.6
Signed-off-by: Matej Hrica <mhrica@redhat.com>
Replace the private next_inode AtomicU64 inside PassthroughFs with a
shared InodeAllocator that is passed in at construction. This lets
multiple layers (e.g. a future virtual-inode overlay) allocate from
the same counter without implicit coordination via reserved ranges.

PassthroughFs::new() and PassthroughFsRo::new() now take an
Arc<InodeAllocator> parameter. FsWorker::new() creates the allocator
and passes it through.

Assisted-by: OpenCode:claude-opus-4.6
Signed-off-by: Matej Hrica <mhrica@redhat.com>
Introduce AugmentFs<T>, a generic overlay that wraps any FileSystem
implementation and intercepts FUSE operations for virtual inodes —
synthetic read-only files backed by static data. One-shot files
can only be looked up once.

The overlay uses the shared InodeAllocator to assign inode numbers,
so virtual and passthrough inodes never collide.

Remove all init.krun special-case code (init_inode, init_handle,
INIT_CSTR, init_payload) from both the Linux and macOS passthrough
implementations. The init.krun virtual file is now configured via
VirtualEntry in the krun API layer and handled generically by the
overlay.

FsDeviceConfig carries a Vec<VirtualEntry> and FsWorker wraps
AugmentFs<PassthroughFs> / AugmentFs<PassthroughFsRo>.

Assisted-by: OpenCode:claude-opus-4.6
Signed-off-by: Matej Hrica <mhrica@redhat.com>
Add API to prevent the default init binary (/init.krun) from being
injected into the root filesystem. Follows the existing
krun_disable_implicit_{console,vsock} pattern.

Must be called before krun_set_root().

Assisted-by: OpenCode:claude-opus-4.6
Signed-off-by: Matej Hrica <mhrica@redhat.com>
Add C API to inject arbitrary virtual files into a virtiofs device.
The file appears in the root directory of the specified mount and is
backed entirely by host memory. Supports one-shot semantics (the file can only be
looked up once).

The data pointer follows the same lifetime contract as other krun
APIs: the caller must keep the memory valid until krun_start_enter()
returns.

Assisted-by: OpenCode:claude-opus-4.6
Signed-off-by: Matej Hrica <mhrica@redhat.com>
Add API to retrieve the built-in default init binary. Callers that
use krun_disable_implicit_init() can use this to obtain the init
binary and inject it themselves via krun_fs_add_overlay_file().

Assisted-by: OpenCode:claude-opus-4.6
Signed-off-by: Matej Hrica <mhrica@redhat.com>
Assisted-by: OpenCode:claude-opus-4.6
Signed-off-by: Matej Hrica <mhrica@redhat.com>
NullFs implements the FileSystem trait with just an empty root
directory. It can be wrapped with AugmentFs to serve virtual
files without any host directory involvement.

Assisted-by: OpenCode:claude-opus-4.6
Signed-off-by: Matej Hrica <mhrica@redhat.com>
krun_set_root_disk_remount no longer creates a temporary empty host
directory. Instead it configures a NullFs-backed virtiofs device
(shared_dir: None) with init.krun overlaid via AugmentFs.

Fs::new() now accepts Option<String> for shared_dir — None selects
NullFs. FsDeviceConfig and FsServer gain the corresponding variants.

Assisted-by: OpenCode:claude-opus-4.6
Signed-off-by: Matej Hrica <mhrica@redhat.com>
The temporary root directory hack is gone (replaced by NullFs), so
the ioctl that cleaned it up and the config flag that gated it are
no longer needed.

Assisted-by: OpenCode:claude-opus-4.6
Signed-off-by: Matej Hrica <mhrica@redhat.com>
mtjhrc added 3 commits May 11, 2026 17:48
The exit-code ioctl is a krun mechanism, not a filesystem operation.
Move it to the AugmentFs where it is handled before any delegation
to the inner filesystem.

The Linux passthrough retains only EXPORT_FD (which needs access to
passthrough-internal handle and export tables). The macOS passthrough
no longer implements ioctl at all (the trait default returns ENOSYS
for any cmd that reaches it).

Assisted-by: OpenCode:claude-opus-4.6
Signed-off-by: Matej Hrica <mhrica@redhat.com>
Boot a VM with a pure NullFs root — no host directory at all. Every
file in the root (init.krun, guest-agent, .krun_config.json, test
data) is injected as a virtual overlay, and /dev, /proc, /sys are
virtual empty directories used as mount points.

The guest verifies:
  - One-shot files (init.krun, guest-agent, .krun_config.json) are
    gone after being consumed
  - Persistent files (marker.txt, testdata.bin) survive and are
    re-readable
  - Write access to virtual files is denied (EACCES)
  - stat reports correct sizes
  - Range reads at various offsets return correct data
  - Read past EOF returns zero bytes

Assisted-by: OpenCode:claude-opus-4.6
Signed-off-by: Matej Hrica <mhrica@redhat.com>
Boot from an ext4 block device via krun_set_root_disk_remount. The
virtiofs root uses NullFs with init.krun and virtual mount-point
directories overlaid. The guest verifies it pivoted to the block
device root successfully.

Assisted-by: OpenCode:claude-opus-4.6
Signed-off-by: Matej Hrica <mhrica@redhat.com>
Replace the C-based build_default_init() in src/devices/build.rs with a
Rust crate (init/) compiled via a cargo subprocess. The new build.rs
probes whether the active rustc supports the x86_64-unknown-linux-musl
target (for a static binary) and falls back to the native target with a
user-visible warning if not.

The KRUN_INIT_BINARY_PATH override mechanism is preserved so that
out-of-tree binaries (e.g. pre-built SEV or TDX images) can still be
injected without rebuilding.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
Assisted-by: Claude Code:claude-sonnet-4.6
Add init/src/fs.rs with:
- mount_once(): helper that treats EBUSY as success
- mount_filesystems(): mounts devtmpfs, proc, sysfs, cgroup2,
  devpts, tmpfs(/dev/shm), and creates the /dev/fd symlink
- is_mount_point(): parses /proc/mounts (avoids triggering Podman
  auto-mounts that stat() would cause)
- mount_tmpfs(): mounts a tmpfs at an arbitrary path

Implement mount_tee_block_root() function used
by both SEV and TDX features to mount /dev/vda and chroot into it.

For amd-sev this replaces the previous LUKS/KBS attestation path
entirely. The SEV and TDX boot paths are now identical at the init level.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
Assisted-by: Claude Code:claude-sonnet-4.6
Extend fs.rs with:
- try_mount(): mounts with a known fstype, or probes /proc/filesystems
  when fstype is None
- mount_block_root_device(): handles KRUN_BLOCK_ROOT_DEVICE by mounting
  the block device at /newroot, issuing KRUN_REMOVE_ROOT_DIR_IOCTL to
  drop the virtiofs temporary root, then pivoting with MS_MOVE
- mount_shared_root(): sets MS_REC|MS_SHARED propagation on /

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
Assisted-by: Claude Code:claude-sonnet-4.6
Port init/dhcp.c to Rust in init/src/dhcp.rs. The public surface is a
single do_dhcp(iface) function with the same behaviour as the C version:

- Sends DHCPDISCOVER with Rapid Commit (option 80)
- On DHCPACK: applies address, route, MTU, and DNS directly
- On DHCPOFFER: completes the 4-way handshake, then applies
- On no response: returns Ok (VM may be IPv6-only)

Netlink structs not exposed by libc (ifinfomsg, ifaddrmsg, rtmsg) are
defined locally with #[repr(C)]. sockaddr_nl and sockaddr_in are
zero-initialised via mem::zeroed() to handle opaque padding fields.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
Assisted-by: Claude Code:claude-sonnet-4.6
Add init/src/config.rs, replacing the hand-rolled jsmn-based parser
with serde_json. Parses /.krun_config.json (or KRUN_CONFIG env var) and
returns a Config struct with:

- argv: Entrypoint ++ (args | Cmd), or None if absent
- workdir: WorkingDir or Cwd
- tmpfs: first tmpfs mount destination not already mounted

Environment variables from the Env array are applied during parsing,
with HOME and TERM always overwritten, all others set only if unset.
A missing or unparseable config file is silently ignored.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
Assisted-by: Claude Code:claude-sonnet-4.6
Add setup_network() and setup_dhcp() to env.rs.

setup_network() brings up lo unconditionally. setup_dhcp() checks that
the interface exists before calling do_dhcp(), and logs a warning on
failure rather than aborting (DHCP failure is non-fatal — the VM may be
IPv6-only or have no network).

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
Assisted-by: Claude Code:claude-sonnet-4.6
Extend env.rs with:
- apply_hostname(): sets hostname from HOSTNAME env var, defaulting
  to "localhost"
- apply_env(): maps KRUN_HOME -> HOME and KRUN_TERM -> TERM
- apply_rlimits(): parses the KRUN_RLIMITS comma-separated list of
  id,cur,max triples and applies each via setrlimit(2)

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
Assisted-by: Claude Code:claude-sonnet-4.6
Add exec.rs with:
- setup_redirects(): walks /sys/class/virtio-ports and dup2s
  krun-stdin/stdout/stderr onto the corresponding file descriptors
- set_exit_code(): reports the workload exit code to the host via
  KRUN_EXIT_CODE_IOCTL, only when the root fs is virtiofs
- run_workload(): forks so PID 1 can reap children; the child calls
  exec_workload() which sets up redirects and execvp's the argv.
  Parent waits for the child, reports exit code, syncs, and reboots.
  KRUN_INIT_PID1=1 skips the fork and exec_workload directly as PID 1.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
Assisted-by: Claude Code:claude-sonnet-4.6
Connect all modules in main() in order:
  1. mount_block_root()          [amd-sev | tdx]
  2. mount_filesystems()
  3. mount_block_root_device()   [KRUN_BLOCK_ROOT_DEVICE]
  4. mount_shared_root()
  5. setsid + TIOCSCTTY
  6. setup_network()
  7. config::load()
  8. mount_tmpfs()               [config tmpfs mount]
  9. apply_env / apply_hostname / apply_rlimits
 10. chdir to workdir
 11. run_workload(argv)

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
Assisted-by: Claude Code:claude-sonnet-4.6
Add init/src/freebsd.rs with:
- kenv_get(): reads a variable from the FreeBSD kernel environment via
  kenv(2), which is the source of env vars for init before the process
  environment is set up
- populate_env_from_kenv(): imports the known KRUN_* variables from
  kenv into std::env at startup so the rest of the code can use
  std::env::var uniformly on both platforms
- open_console(): replicates login_tty(3) without linking libutil —
  revokes existing opens of /dev/console, opens it, creates a new
  session via setsid(2), sets the controlling terminal via TIOCSCTTY,
  and dup2s it onto stdio; falls back to /dev/null + /init.log
- mount_config_iso() / unmount_config_iso(): mounts the KRUN_CONFIG
  ISO 9660 image at /mnt via nmount(2) so the JSON config file can be
  read, then unmounts it afterwards

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
Assisted-by: Claude Code:claude-sonnet-4.6
Connect the FreeBSD helpers into the boot sequence:
- open_console() and populate_env_from_kenv() are called at the very
  start of main() before anything else
- setsid/TIOCSCTTY are Linux-only; open_console() handles session setup
  on FreeBSD
- setlogin("root") is called on FreeBSD after console setup
- KRUN_DHCP and DHCP setup are Linux-only
- If KRUN_CONFIG is not set, mount_config_iso() is attempted; the ISO
  is unmounted immediately after config::load() returns
- fs::* mounts and mount_shared_root are Linux-only
- exec_workload() calls open_console() on FreeBSD instead of
  setup_redirects(), giving the child process a fresh controlling
  terminal before execvp

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
Assisted-by: Claude Code:claude-sonnet-4.6
Replace the C-based BSD init build rule (which referenced the now-deleted
init/init.c) with a cargo build rule targeting the correct Rust triple.

Makefile:
- Remove dead INIT_SRC = init/init.c variable.
- Derive FREEBSD_RUST_TARGET from the host ARCH with arm64→aarch64
  substitution to get the correct Rust triple.
- Set CARGO_BSD_RUSTFLAGS with the clang cross-linker flags (mirroring
  the existing CC_BSD setup) so cargo can link for FreeBSD.
- aarch64-unknown-freebsd is a Tier 3 target with no prebuilt std;
  use +nightly -Z build-std for that case.

setup-build-env:
- Add rustup target add x86_64-unknown-freebsd (Tier 2, prebuilt std).
- Install nightly toolchain + rust-src for the aarch64 FreeBSD case.

cross-compilation.yml:
- Add clang to the Linux cross-compilation dependencies so the
  FreeBSD linker flags resolve correctly on Linux runners.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
Assisted-by: Claude Code:claude-sonnet-4.6
Implements the timesync feature behind the `timesync` cargo feature flag.
Receives host-side nanosecond timestamps over AF_VSOCK/SOCK_DGRAM on port
123 and applies them via clock_settime when the delta exceeds 100ms.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
Assisted-by: Claude Code:claude-sonnet-4.6
Delete init/init.c, init/dhcp.c, init/dhcp.h, init/jsmn.h, and the
entire init/tee/ directory (snp_attest.c/h and the KBS client).

The amd-sev feature no longer performs LUKS unlock or KBS attestation —
it mounts /dev/vda as ext4 like the tdx path does.

Signed-off-by: Jake Correnti <jakecorrenti+github@proton.me>
Assisted-by: Claude Code:claude-sonnet-4.6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rewrite init in Rust

3 participants