feature: TLS passthrough will select known app_id from TXT record w multiple entries#653
feature: TLS passthrough will select known app_id from TXT record w multiple entries#653wwwehr wants to merge 1 commit intoDstack-TEE:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the TLS passthrough SNI router to better handle DNS TXT app-address lookups when multiple entries are present, preferring an app address whose app_id is already known locally (and falling back to the first parseable entry otherwise).
Changes:
- Adds
select_app_address(...)to choose an app address from TXT data, preferring locally-knownapp_ids with a fallback behavior. - Threads
&Proxyintoresolve_app_address(...)so selection can consult the in-memoryappsmap. - Adds unit tests for selection behavior and updates the existing DNS integration test to construct a
Proxy.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for lookup in [lookup, lookup_legacy] { | ||
| let Ok(lookup) = lookup else { | ||
| continue; | ||
| }; | ||
| let Some(txt_record) = lookup.iter().next() else { | ||
| continue; | ||
| }; | ||
| let Some(data) = txt_record.txt_data().first() else { | ||
| continue; | ||
| }; | ||
| return AppAddress::parse(data) | ||
| .with_context(|| format!("failed to parse app address for {sni}")); | ||
| let locked = state.lock(); | ||
| if let Ok(addr) = select_app_address(txt_record.txt_data(), &locked.state.apps) { | ||
| return Ok(addr); |
There was a problem hiding this comment.
resolve_app_address still only inspects the first TXT RR (lookup.iter().next()), so multi-record TXT sets (multiple RRs for the same name, which is common when publishing multiple values) will remain non-deterministic and may ignore locally-known app_ids. To match the stated multi-entry behavior, iterate across all TXT records returned by the lookup and consider all txt_data() strings when selecting the address (prefer local matches across the full set, otherwise fall back to the first parseable).
| } else if let Ok(lookup) = resolver.txt_lookup(txt_domain).await { | ||
| if let Some(txt_record) = lookup.iter().next() { | ||
| if let Some(data) = txt_record.txt_data().first() { | ||
| return AppAddress::parse(data) | ||
| .with_context(|| format!("failed to parse app address for {sni}")); | ||
| let locked = state.lock(); | ||
| if let Ok(addr) = select_app_address(txt_record.txt_data(), &locked.state.apps) { | ||
| return Ok(addr); | ||
| } | ||
| } |
There was a problem hiding this comment.
In the non-wildcard branches the error from select_app_address(...) is discarded (if let Ok(addr) = ...), so cases like “TXT record exists but no parseable/usable entries” end up as a generic resolve failure (or a misleading wildcard-lookup error) with no indication that the primary TXT record was present but invalid. Consider retaining the selection error and returning it (with domain context) when no later branch succeeds, so operators can diagnose malformed records.
| /// resolve app address by sni | ||
| async fn resolve_app_address(prefix: &str, sni: &str, compat: bool) -> Result<AppAddress> { | ||
| async fn resolve_app_address(prefix: &str, sni: &str, compat: bool, state: &Proxy) -> Result<AppAddress> { |
There was a problem hiding this comment.
resolve_app_address now depends on &Proxy solely to read locked.state.apps, which tightens coupling and forces tests to construct a full Proxy (spawning background tasks, initializing KvStore/certbot/acceptors) just to exercise DNS parsing/selection. A simpler, more testable shape would be to pass in a read-only view of known app_ids (e.g., a &BTreeMap<...> or &BTreeSet<String> captured/cloned by the caller after locking) and keep DNS resolution independent of the full proxy runtime.
|
Thanks for putting this together. Before landing, two concerns about the DNS layer this operates on — the fix targets a layer that won't see the scenario it's meant to solve. 1. RRset vs. character-strings within one record DNS has two independent "multi-value" axes at a TXT name:
In if let Some(txt_record) = lookup.iter().next() { // picks ONE RR from the RRset
if let Ok(addr) = select_app_address(txt_record.txt_data(), ...) {
// ^^^^^^^^^^^^^^^^^^^^^^
// iterates strings within that one RRSo If the intent is "iterate across the RRset", the shape is more like: for txt_record in lookup.iter() {
if let Some(data) = txt_record.txt_data().first() {
// apply known-app selection against `data`
}
}2. Upstream writer makes this scenario non-stationary regardless Even with the gateway-side read fixed, the reference ingress writer ( As you note, the co-tenant scenario is reachable today only by manually pruning CAA and manually forcing two projects onto the same name — not a supported setup. That's fine as a niche, but it means the premise for "TXT records will contain multiple app_id candidates the gateway should choose from" depends on out-of-band manual DNS edits that also have to survive every ingress restart. Suggestion If the goal is to keep the door open for externally-coordinated multi-writer setups:
If the goal is just to make TXT selection deterministic on the common single-record path, the smaller change is fine — but then the PR description should drop the "two shared ingress instances" framing, since the upstream writer prevents that RRset shape from appearing on the wire. Happy to iterate on this. |
Problem
As a phala cloud user/operator, I want to have two dstack-ingress instances share a
given
_dstack-app-address.xxxso that I can properly route a shared hostnameusing a load balancer. Currently, the router only accepts first dns record it
discovers which, also, isn't deterministic.
Fix
Extract
select_app_addressto iterate all values in the TXT record and returnthe first whose
app_idis known to the local gateway'sappsmap. Falls backto the first parseable entry when none are locally known (preserves behavior
for single-instance setups).
The compat path (
_tapp-addresslegacy) is unchanged.Tests
test_resolve_app_address— original live DNS integration test with mock statetest_select_app_address_prefers_local— multi-record, picks local app_idtest_select_app_address_fallback_when_none_local— multi-record, fallback to first