perf: eliminate per-call malloc in kdf() and per-packet malloc in encrypt/decrypt#21
perf: eliminate per-call malloc in kdf() and per-packet malloc in encrypt/decrypt#21MarkAtwood wants to merge 2 commits intowolfSSL:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR reduces cryptographic fast-path heap allocations by reusing preallocated buffers/contexts during WireGuard Noise handshakes and packet AEAD operations.
Changes:
- Add an embedded
struct Hmacscratch buffer tostruct noise_handshakeand thread a caller-supplied HMAC buffer throughkdf()and related helpers to avoid per-call allocations. - Add preallocated AES-GCM scatterlist encrypt/decrypt helpers and switch packet send/receive paths to reuse per-keypair
Aescontexts. - Add Beads (bd) agent/workflow integration files and update gitignore patterns.
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| kernel-src/wolfcrypt_glue.h | Exposes prealloc AES-GCM sg encrypt/decrypt APIs. |
| kernel-src/wolfcrypt_glue.c | Implements AES-GCM sg in-place crypt using a preinitialized Aes context. |
| kernel-src/send.c | Uses preallocated AES context for packet encryption. |
| kernel-src/receive.c | Uses preallocated AES context for packet decryption. |
| kernel-src/noise.h | Adds per-keypair AES contexts + per-handshake HKDF scratch Hmac. |
| kernel-src/noise.c | Adds kdf(hmac_buf, ...) and initializes/frees per-keypair AES contexts. |
| CLAUDE.md | Adds AI-agent workflow instructions (Beads integration). |
| AGENTS.md | Adds agent instructions (Beads + non-interactive shell guidance). |
| .gitignore | Ignores Dolt/Beads artifacts (including *.db). |
| .claude/settings.json | Runs bd prime hook on session events. |
| .beads/* | Adds Beads configuration + git hooks + metadata. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const u8 *ad, const size_t ad_len, | ||
| const u64 nonce, | ||
| Aes *aes) | ||
| { |
There was a problem hiding this comment.
src_len - WC_AES_BLOCK_SIZE can underflow when src_len < WC_AES_BLOCK_SIZE (e.g., malformed/short packets), turning into a huge size_t and causing incorrect behavior. Add an explicit length check in this wrapper (or before calling it) and return false/-EINVAL when the input is shorter than the auth tag.
| { | |
| { | |
| if (src_len < WC_AES_BLOCK_SIZE) | |
| return false; |
| } | ||
|
|
||
| flags = SG_MITER_TO_SG | SG_MITER_ATOMIC; | ||
|
|
There was a problem hiding this comment.
In the WOLFSSL_AESGCM_STREAM implementation there is no sg_nents(src) < 1 guard (the non-stream path has one). If sg_nents(src) == 0, the miter never yields data and the function can proceed to tag handling/finalization without having processed any bytes. Add an early sg_nents(src) < 1 validation (consistent with the non-stream variant) and return -EINVAL.
| if (sg_nents(src) < 1) { | |
| ret = -EINVAL; | |
| WC_DEBUG_PR_CODEPOINT(); | |
| goto out; | |
| } |
| else | ||
| wc_ForceZero(buf, src_len + WC_AES_BLOCK_SIZE); | ||
| } | ||
| free(buf); |
There was a problem hiding this comment.
This allocates with XMALLOC(...) but frees with free(buf). That can be an allocator mismatch depending on how wolfSSL is configured in this kernel module (typically XMALLOC pairs with XFREE). Replace free(buf) with the appropriate wolfSSL deallocator for the allocation site (and keep the existing zeroization behavior).
| free(buf); | |
| XFREE(buf, NULL, DYNAMIC_TYPE_TMP_BUFFER); |
| static int __must_check kdf(struct Hmac *hmac_buf, | ||
| u8 *first_dst, u8 *second_dst, u8 *third_dst, const u8 *data, | ||
| size_t first_len, size_t second_len, size_t third_len, | ||
| size_t data_len, const u8 chaining_key[NOISE_HASH_LEN]) | ||
| { | ||
| u8 output[WC_SHA256_DIGEST_SIZE + 1]; | ||
| u8 secret[WC_SHA256_DIGEST_SIZE]; | ||
| struct Hmac *wc_hmac; | ||
| bool hmac_alloced = false; | ||
| int ret; |
There was a problem hiding this comment.
When hmac_buf is provided, the struct Hmac becomes a long-lived heap field (handshake->kdf_hmac). That increases the lifetime of any internal HMAC state/key material that wc_hmac_oneshot_prealloc() may leave in the struct. Consider explicitly clearing the HMAC struct before returning (e.g., memzero_explicit(wc_hmac, sizeof(*wc_hmac))), at least in the hmac_buf != NULL case, to avoid retaining sensitive material across handshake operations.
| if (hmac_buf) { | ||
| wc_hmac = hmac_buf; | ||
| } else { | ||
| wc_hmac = (struct Hmac *)malloc(sizeof(*wc_hmac)); |
There was a problem hiding this comment.
When hmac_buf is provided, the struct Hmac becomes a long-lived heap field (handshake->kdf_hmac). That increases the lifetime of any internal HMAC state/key material that wc_hmac_oneshot_prealloc() may leave in the struct. Consider explicitly clearing the HMAC struct before returning (e.g., memzero_explicit(wc_hmac, sizeof(*wc_hmac))), at least in the hmac_buf != NULL case, to avoid retaining sensitive material across handshake operations.
| if (hmac_alloced) | ||
| free(wc_hmac); |
There was a problem hiding this comment.
When hmac_buf is provided, the struct Hmac becomes a long-lived heap field (handshake->kdf_hmac). That increases the lifetime of any internal HMAC state/key material that wc_hmac_oneshot_prealloc() may leave in the struct. Consider explicitly clearing the HMAC struct before returning (e.g., memzero_explicit(wc_hmac, sizeof(*wc_hmac))), at least in the hmac_buf != NULL case, to avoid retaining sensitive material across handshake operations.
| @@ -0,0 +1,69 @@ | |||
| # Project Instructions for AI Agents | |||
There was a problem hiding this comment.
The PR title/description focus on embedding Hmac to remove per-kdf() allocations, but this PR also adds AES-GCM prealloc APIs/usage and introduces Beads/agent workflow files (multiple new docs, hooks, settings, and gitignore updates). To keep changes reviewable and traceable, either (a) update the PR title/description to reflect these additional functional changes, or (b) split Beads/agent integration and/or AES-GCM changes into separate PRs.
kdf() is called 1-4 times per handshake operation, and previously malloc'd and free'd an 832-byte Hmac struct on every call. Add a struct Hmac kdf_hmac field to struct noise_handshake (which is embedded in the heap-allocated struct wg_peer). Thread a struct Hmac * parameter through kdf() and its callers (mix_dh, mix_precomputed_dh, mix_psk, message_ephemeral, derive_keys). Callers that hold handshake->lock for write throughout (create_initiation, create_response, begin_session) pass &handshake->kdf_hmac, eliminating all per-call kmalloc for those paths. Callers that perform kdf operations outside the write lock (consume_initiation, consume_response) pass NULL, causing kdf() to fall back to kmalloc, preserving correctness under concurrent access.
Add wc_AesGcm_encrypt_sg_inplace_prealloc and wc_AesGcm_decrypt_sg_inplace_prealloc to wolfcrypt_glue.c/.h. These accept a pre-initialised Aes* instead of allocating one per call. The key schedule set at keypair creation time is reused by passing NULL key to wc_AesGcmEncryptInit/DecryptInit. Update send.c encrypt_packet and receive.c decrypt_packet to call the prealloc variants with the Aes fields embedded in noise_keypair (added in the preceding commit), eliminating the per-packet kmalloc/kfree of the Aes struct.
dcf6392 to
e0aa9b3
Compare
Summary
Two hot-path malloc eliminations in the same PR (they share noise.h changes):
ue7.45 — kdf() per-call Hmac malloc
kdf()implements HKDF (RFC 5869) and is called 1–4× per Noise handshake.Previously every call malloc'd and free'd an 832-byte
Hmacstruct.struct Hmac kdf_hmactostruct noise_handshake(which lives in the heap-allocatedstruct wg_peer).struct Hmac *hmac_bufparameter throughkdf()and all callers (mix_dh,mix_precomputed_dh,mix_psk,message_ephemeral,derive_keys).handshake->lockfor write throughout (create_initiation,create_response,begin_session) pass&handshake->kdf_hmac— no per-call kmalloc.consume_initiation,consume_response) pass NULL, preserving correctness via kmalloc fallback.ue7.12 — per-packet Aes malloc in encrypt/decrypt
encrypt_packetanddecrypt_packetwere called on every data packet and previously malloc'd/free'd anAesstruct (400–600 bytes) per call.Aes aes_encryptandAes aes_decrypttostruct noise_keypair(heap-allocated).bool aes_initedtracks initialisation for safe teardown.wc_AesGcmSetKeyonce atwg_noise_handshake_begin_session. Free withwc_AesFreeat keypair teardown.wc_AesGcm_encrypt_sg_inplace_prealloc/wc_AesGcm_decrypt_sg_inplace_preallocto wolfcrypt_glue (both WOLFSSL_AESGCM_STREAM and non-stream paths). These accept a pre-initialisedAes*; the per-packet init reuses the key schedule by passing NULL key towc_AesGcmEncryptInit/DecryptInit.send.candreceive.cto call the prealloc variants with&keypair->aes_encrypt/&keypair->aes_decrypt.No stack allocation
sizeof(struct Hmac)≈ 832 bytes andsizeof(Aes)≈ 400–600 bytes — both too large for the kernel stack. All new fields live in existing heap-allocated structs.Test plan
WOLFSSL_AESGCM_STREAMWOLFSSL_AESGCM_STREAMwg-fips genkey | wg-fips pubkeysucceeds after loading updated module