Skip to content

perf: eliminate per-call malloc in kdf() and per-packet malloc in encrypt/decrypt#21

Open
MarkAtwood wants to merge 2 commits intowolfSSL:masterfrom
MarkAtwood:perf/embed-hmac-in-handshake-kdf
Open

perf: eliminate per-call malloc in kdf() and per-packet malloc in encrypt/decrypt#21
MarkAtwood wants to merge 2 commits intowolfSSL:masterfrom
MarkAtwood:perf/embed-hmac-in-handshake-kdf

Conversation

@MarkAtwood
Copy link
Copy Markdown

@MarkAtwood MarkAtwood commented Apr 17, 2026

Summary

Two hot-path malloc eliminations in the same PR (they share noise.h changes):

ue7.45 — kdf() per-call Hmac malloc

kdf() implements HKDF (RFC 5869) and is called 1–4× per Noise handshake.
Previously every call malloc'd and free'd an 832-byte Hmac struct.

  • Add struct Hmac kdf_hmac to struct noise_handshake (which lives in the heap-allocated struct wg_peer).
  • Thread a struct Hmac *hmac_buf parameter through kdf() and all callers (mix_dh, mix_precomputed_dh, mix_psk, message_ephemeral, derive_keys).
  • Callers that hold handshake->lock for write throughout (create_initiation, create_response, begin_session) pass &handshake->kdf_hmac — no per-call kmalloc.
  • Callers that operate outside the write lock (consume_initiation, consume_response) pass NULL, preserving correctness via kmalloc fallback.

ue7.12 — per-packet Aes malloc in encrypt/decrypt

encrypt_packet and decrypt_packet were called on every data packet and previously malloc'd/free'd an Aes struct (400–600 bytes) per call.

  • Add Aes aes_encrypt and Aes aes_decrypt to struct noise_keypair (heap-allocated). bool aes_inited tracks initialisation for safe teardown.
  • Initialise both with wc_AesGcmSetKey once at wg_noise_handshake_begin_session. Free with wc_AesFree at keypair teardown.
  • Add wc_AesGcm_encrypt_sg_inplace_prealloc / wc_AesGcm_decrypt_sg_inplace_prealloc to wolfcrypt_glue (both WOLFSSL_AESGCM_STREAM and non-stream paths). These accept a pre-initialised Aes*; the per-packet init reuses the key schedule by passing NULL key to wc_AesGcmEncryptInit/DecryptInit.
  • Update send.c and receive.c to call the prealloc variants with &keypair->aes_encrypt / &keypair->aes_decrypt.

No stack allocation

sizeof(struct Hmac) ≈ 832 bytes and sizeof(Aes) ≈ 400–600 bytes — both too large for the kernel stack. All new fields live in existing heap-allocated structs.

Test plan

  • Module builds without warnings on x86-64 with WOLFSSL_AESGCM_STREAM
  • Module builds without warnings on x86-64 without WOLFSSL_AESGCM_STREAM
  • wg-fips genkey | wg-fips pubkey succeeds after loading updated module
  • WolfGuard tunnel establishes and passes traffic (encrypt + decrypt exercised)
  • Keypair teardown (tunnel down / key rotation) frees Aes contexts without leak

Copilot AI review requested due to automatic review settings April 17, 2026 20:59
@MarkAtwood MarkAtwood changed the title perf: embed Hmac in noise_handshake to eliminate per-kdf-call malloc perf: eliminate per-call malloc in kdf() and per-packet malloc in encrypt/decrypt Apr 17, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR reduces cryptographic fast-path heap allocations by reusing preallocated buffers/contexts during WireGuard Noise handshakes and packet AEAD operations.

Changes:

  • Add an embedded struct Hmac scratch buffer to struct noise_handshake and thread a caller-supplied HMAC buffer through kdf() and related helpers to avoid per-call allocations.
  • Add preallocated AES-GCM scatterlist encrypt/decrypt helpers and switch packet send/receive paths to reuse per-keypair Aes contexts.
  • Add Beads (bd) agent/workflow integration files and update gitignore patterns.

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
kernel-src/wolfcrypt_glue.h Exposes prealloc AES-GCM sg encrypt/decrypt APIs.
kernel-src/wolfcrypt_glue.c Implements AES-GCM sg in-place crypt using a preinitialized Aes context.
kernel-src/send.c Uses preallocated AES context for packet encryption.
kernel-src/receive.c Uses preallocated AES context for packet decryption.
kernel-src/noise.h Adds per-keypair AES contexts + per-handshake HKDF scratch Hmac.
kernel-src/noise.c Adds kdf(hmac_buf, ...) and initializes/frees per-keypair AES contexts.
CLAUDE.md Adds AI-agent workflow instructions (Beads integration).
AGENTS.md Adds agent instructions (Beads + non-interactive shell guidance).
.gitignore Ignores Dolt/Beads artifacts (including *.db).
.claude/settings.json Runs bd prime hook on session events.
.beads/* Adds Beads configuration + git hooks + metadata.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

const u8 *ad, const size_t ad_len,
const u64 nonce,
Aes *aes)
{
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src_len - WC_AES_BLOCK_SIZE can underflow when src_len < WC_AES_BLOCK_SIZE (e.g., malformed/short packets), turning into a huge size_t and causing incorrect behavior. Add an explicit length check in this wrapper (or before calling it) and return false/-EINVAL when the input is shorter than the auth tag.

Suggested change
{
{
if (src_len < WC_AES_BLOCK_SIZE)
return false;

Copilot uses AI. Check for mistakes.
}

flags = SG_MITER_TO_SG | SG_MITER_ATOMIC;

Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the WOLFSSL_AESGCM_STREAM implementation there is no sg_nents(src) < 1 guard (the non-stream path has one). If sg_nents(src) == 0, the miter never yields data and the function can proceed to tag handling/finalization without having processed any bytes. Add an early sg_nents(src) < 1 validation (consistent with the non-stream variant) and return -EINVAL.

Suggested change
if (sg_nents(src) < 1) {
ret = -EINVAL;
WC_DEBUG_PR_CODEPOINT();
goto out;
}

Copilot uses AI. Check for mistakes.
else
wc_ForceZero(buf, src_len + WC_AES_BLOCK_SIZE);
}
free(buf);
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allocates with XMALLOC(...) but frees with free(buf). That can be an allocator mismatch depending on how wolfSSL is configured in this kernel module (typically XMALLOC pairs with XFREE). Replace free(buf) with the appropriate wolfSSL deallocator for the allocation site (and keep the existing zeroization behavior).

Suggested change
free(buf);
XFREE(buf, NULL, DYNAMIC_TYPE_TMP_BUFFER);

Copilot uses AI. Check for mistakes.
Comment thread kernel-src/noise.c
Comment on lines +353 to 362
static int __must_check kdf(struct Hmac *hmac_buf,
u8 *first_dst, u8 *second_dst, u8 *third_dst, const u8 *data,
size_t first_len, size_t second_len, size_t third_len,
size_t data_len, const u8 chaining_key[NOISE_HASH_LEN])
{
u8 output[WC_SHA256_DIGEST_SIZE + 1];
u8 secret[WC_SHA256_DIGEST_SIZE];
struct Hmac *wc_hmac;
bool hmac_alloced = false;
int ret;
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When hmac_buf is provided, the struct Hmac becomes a long-lived heap field (handshake->kdf_hmac). That increases the lifetime of any internal HMAC state/key material that wc_hmac_oneshot_prealloc() may leave in the struct. Consider explicitly clearing the HMAC struct before returning (e.g., memzero_explicit(wc_hmac, sizeof(*wc_hmac))), at least in the hmac_buf != NULL case, to avoid retaining sensitive material across handshake operations.

Copilot uses AI. Check for mistakes.
Comment thread kernel-src/noise.c
Comment on lines +372 to +375
if (hmac_buf) {
wc_hmac = hmac_buf;
} else {
wc_hmac = (struct Hmac *)malloc(sizeof(*wc_hmac));
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When hmac_buf is provided, the struct Hmac becomes a long-lived heap field (handshake->kdf_hmac). That increases the lifetime of any internal HMAC state/key material that wc_hmac_oneshot_prealloc() may leave in the struct. Consider explicitly clearing the HMAC struct before returning (e.g., memzero_explicit(wc_hmac, sizeof(*wc_hmac))), at least in the hmac_buf != NULL case, to avoid retaining sensitive material across handshake operations.

Copilot uses AI. Check for mistakes.
Comment thread kernel-src/noise.c
Comment on lines +439 to 440
if (hmac_alloced)
free(wc_hmac);
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When hmac_buf is provided, the struct Hmac becomes a long-lived heap field (handshake->kdf_hmac). That increases the lifetime of any internal HMAC state/key material that wc_hmac_oneshot_prealloc() may leave in the struct. Consider explicitly clearing the HMAC struct before returning (e.g., memzero_explicit(wc_hmac, sizeof(*wc_hmac))), at least in the hmac_buf != NULL case, to avoid retaining sensitive material across handshake operations.

Copilot uses AI. Check for mistakes.
Comment thread CLAUDE.md Outdated
@@ -0,0 +1,69 @@
# Project Instructions for AI Agents
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title/description focus on embedding Hmac to remove per-kdf() allocations, but this PR also adds AES-GCM prealloc APIs/usage and introduces Beads/agent workflow files (multiple new docs, hooks, settings, and gitignore updates). To keep changes reviewable and traceable, either (a) update the PR title/description to reflect these additional functional changes, or (b) split Beads/agent integration and/or AES-GCM changes into separate PRs.

Copilot uses AI. Check for mistakes.
@MarkAtwood MarkAtwood requested a review from douzzer April 17, 2026 21:10
kdf() is called 1-4 times per handshake operation, and previously
malloc'd and free'd an 832-byte Hmac struct on every call.

Add a struct Hmac kdf_hmac field to struct noise_handshake (which is
embedded in the heap-allocated struct wg_peer).  Thread a struct Hmac *
parameter through kdf() and its callers (mix_dh, mix_precomputed_dh,
mix_psk, message_ephemeral, derive_keys).

Callers that hold handshake->lock for write throughout (create_initiation,
create_response, begin_session) pass &handshake->kdf_hmac, eliminating
all per-call kmalloc for those paths.

Callers that perform kdf operations outside the write lock
(consume_initiation, consume_response) pass NULL, causing kdf() to fall
back to kmalloc, preserving correctness under concurrent access.
Add wc_AesGcm_encrypt_sg_inplace_prealloc and
wc_AesGcm_decrypt_sg_inplace_prealloc to wolfcrypt_glue.c/.h.
These accept a pre-initialised Aes* instead of allocating one
per call.  The key schedule set at keypair creation time is
reused by passing NULL key to wc_AesGcmEncryptInit/DecryptInit.

Update send.c encrypt_packet and receive.c decrypt_packet to
call the prealloc variants with the Aes fields embedded in
noise_keypair (added in the preceding commit), eliminating the
per-packet kmalloc/kfree of the Aes struct.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants