From ccd52f81e1a539e182c4f7fcfdff2ad58f3fb071 Mon Sep 17 00:00:00 2001 From: Komh Date: Sun, 26 Apr 2026 02:55:40 +0000 Subject: [PATCH 1/2] [virtualization] Cold VM Migration from VMware Hangs at Conversion-Progress Reporting --- ..._Hangs_at_Conversion_Progress_Reporting.md | 112 ++++++++++++++++++ 1 file changed, 112 insertions(+) create mode 100644 docs/en/solutions/Cold_VM_Migration_from_VMware_Hangs_at_Conversion_Progress_Reporting.md diff --git a/docs/en/solutions/Cold_VM_Migration_from_VMware_Hangs_at_Conversion_Progress_Reporting.md b/docs/en/solutions/Cold_VM_Migration_from_VMware_Hangs_at_Conversion_Progress_Reporting.md new file mode 100644 index 000000000..cb33f6b12 --- /dev/null +++ b/docs/en/solutions/Cold_VM_Migration_from_VMware_Hangs_at_Conversion_Progress_Reporting.md @@ -0,0 +1,112 @@ +--- +kind: + - Troubleshooting +products: + - Alauda Container Platform +ProductsVersion: + - 4.1.0,4.2.x +--- +## Issue + +A cold migration of a VMware VM into ACP Virtualization stalls at the final step. The migration plan sits at "converting…" forever. The `virt-v2v` pod in the target namespace completes its conversion work and opens an HTTP server to report the resulting VM XML, but the migration controller never completes the plan. Representative symptoms: + +- `virt-v2v` pod log ends with `Starting server on:8080`, no crash, no further progress. +- Migration-controller log in the platform-migration namespace reports: + + ```text + msg="Failed to update conversion progress" + error="Get \"http://10.128.1.34:2112/metrics\": dial tcp 10.128.1.34:2112: i/o timeout" + ``` + +## Root Cause + +The VM-from-VMware workflow has two pods in two namespaces: + +- The **migration controller** (Forklift controller) runs in the platform migration namespace (install-specific; on ACP it's the Virtualization migration component's namespace). +- The **virt-v2v + vddk** conversion pod runs in the **target** namespace — where the imported VM will live. + +At the end of conversion, virt-v2v exposes two HTTP endpoints to the controller: + +| Port | Purpose | +|---|---| +| `8080` | serves the produced VM XML (for the controller to read and create the VMI) | +| `2112` | exposes conversion progress metrics | + +If the **target** namespace has a default-deny NetworkPolicy (or any policy that doesn't explicitly admit ingress from the migration-controller namespace), both of those endpoints are unreachable. The controller times out on `:2112/metrics`, the plan never reads the finished XML from `:8080`, and the migration hangs indefinitely. + +Common-case predicate: the target namespace carries a standard "deny-by-default + allow-from-same-namespace + allow-from-ingress + allow-from-monitoring" set of policies — which is a reasonable security baseline but doesn't know about the migration controller. + +## Resolution + +Admit ingress from the migration-controller namespace on the target namespace. Apply only while migrations are active, or keep it as a standing policy if the target namespace routinely receives VM imports. + +1. **Identify the migration-controller namespace.** It carries the virtualization migration controller pod. The exact name depends on the ACP Virtualization install: + + ```bash + # Find the controller pod by label (common selector) + kubectl get pod -A -l app.kubernetes.io/component=forklift-controller -o wide + # Or by deployment name pattern + kubectl get pod -A | grep -E 'forklift.*controller|migration.*controller' + ``` + + Whatever namespace shows up is the one to admit. + +2. **Add the allow-ingress policy in the target namespace.** Pair the namespace selector with a pod selector so only the conversion pod is reachable: + + ```yaml + apiVersion: networking.k8s.io/v1 + kind: NetworkPolicy + metadata: + name: allow-from-migration-controller + namespace: + spec: + podSelector: + matchExpressions: + - key: forklift.konveyor.io/plan + operator: Exists + policyTypes: + - Ingress + ingress: + - from: + - namespaceSelector: + matchLabels: + kubernetes.io/metadata.name: + ports: + - protocol: TCP + port: 8080 + - protocol: TCP + port: 2112 + ``` + + The `podSelector` scopes the rule to the virt-v2v pods only (they carry the `forklift.konveyor.io/plan` label). This keeps the rest of the namespace's default-deny posture intact. + +3. **Re-run the plan.** The controller picks up the next reconcile within ~10s and completes. No need to restart anything. + +4. **For a standing solution**, include the allow-from-migration-controller policy in the namespace template / project template used for VM-import landing zones. Once a team knows their namespace will receive migrations, the policy should be pre-provisioned. + +## Diagnostic Steps + +Confirm the controller is timing out on the metrics endpoint: + +```bash +kubectl -n logs deploy/ --tail=200 \ + | grep -i 'Failed to update conversion progress' +``` + +Confirm virt-v2v reached the "ready-to-serve" state: + +```bash +kubectl -n get pod -l forklift.konveyor.io/plan -o wide +kubectl -n logs -c virt-v2v --tail=20 +``` + +If the log ends with `Starting server on:8080` and no further activity, the conversion is done — the issue is network reachability, not the conversion itself. + +Verify the namespace's inbound policy set: + +```bash +kubectl -n get networkpolicy +kubectl -n describe networkpolicy +``` + +A "deny-all-by-default" rule combined with no explicit allow from the migration-controller namespace matches the failure mode. Apply the NetworkPolicy above; the plan resumes within one reconcile. From 6c3bd4c56798928d158ddf6581488941eea8efca Mon Sep 17 00:00:00 2001 From: Komh Date: Thu, 14 May 2026 17:29:49 +0800 Subject: [PATCH 2/2] [kb] Cold VM Migration Hangs Because NetworkPolicy in the Target Namespace Blocks the virt-v2v Pod --- ...arget_Namespace_Blocks_the_virt_v2v_Pod.md | 96 +++++++++++++++ ..._Hangs_at_Conversion_Progress_Reporting.md | 112 ------------------ 2 files changed, 96 insertions(+), 112 deletions(-) create mode 100644 docs/en/solutions/Cold_VM_Migration_Hangs_Because_NetworkPolicy_in_the_Target_Namespace_Blocks_the_virt_v2v_Pod.md delete mode 100644 docs/en/solutions/Cold_VM_Migration_from_VMware_Hangs_at_Conversion_Progress_Reporting.md diff --git a/docs/en/solutions/Cold_VM_Migration_Hangs_Because_NetworkPolicy_in_the_Target_Namespace_Blocks_the_virt_v2v_Pod.md b/docs/en/solutions/Cold_VM_Migration_Hangs_Because_NetworkPolicy_in_the_Target_Namespace_Blocks_the_virt_v2v_Pod.md new file mode 100644 index 000000000..88241ca8c --- /dev/null +++ b/docs/en/solutions/Cold_VM_Migration_Hangs_Because_NetworkPolicy_in_the_Target_Namespace_Blocks_the_virt_v2v_Pod.md @@ -0,0 +1,96 @@ +--- +kind: + - Troubleshooting +products: + - Alauda Container Platform +ProductsVersion: + - 4.1.0,4.2.x +--- + +# Cold VM Migration Hangs Because NetworkPolicy in the Target Namespace Blocks the virt-v2v Pod +## Issue + +A cold VM migration from VMware into ACP virtualization (KubeVirt), +driven by the Alauda Build of Forklift Operator, hangs at the +conversion-progress step. The `forklift-controller` logs repeat +`Failed to update conversion progress`, and the migration's +`virt-v2v` pod in the target namespace logs that it is serving but +seeing no consumer reach it. + +## Root Cause + +At the conversion step, the per-VM `virt-v2v` pod is launched in the +**target namespace** (the namespace where the imported VM will live) +and exposes the VM's XML on port `8080`. The `forklift-controller` +runs in `konveyor-forklift` and polls that endpoint to read +conversion progress. + +If the target namespace has a `NetworkPolicy` that denies ingress by +default (or restricts ingress to a label set that does not include +the Forklift namespace), the controller's TCP connection to the +`virt-v2v` pod's `:8080` is dropped. The migration pod is healthy in +isolation; the controller cannot observe its progress, so it never +advances past the conversion step. + +## Resolution + +Add a `NetworkPolicy` to the **target namespace** allowing ingress on +port `8080` from the `konveyor-forklift` namespace: + +```yaml +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: allow-forklift-controller + namespace: +spec: + podSelector: {} # apply to virt-v2v pods (and any others) + policyTypes: + - Ingress + ingress: + - from: + - namespaceSelector: + matchLabels: + kubernetes.io/metadata.name: konveyor-forklift + ports: + - protocol: TCP + port: 8080 +``` + +Apply, then re-run the migration plan. The controller's polling reaches +`virt-v2v` and conversion progress advances normally. + +If existing policies in the target namespace use a different selector +shape (named labels, allow-all-ingress with a deny-list, etc.), add a +single allow rule equivalent to the above; the rule is purely +additive. + +## Diagnostic Steps + +Confirm where the controller and the conversion pod live: + +```bash +kubectl -n konveyor-forklift get deploy forklift-controller +kubectl -n get pod -l job-name -o wide # the virt-v2v pod is a Job-spawned pod +``` + +Confirm a NetworkPolicy is in fact blocking the path: + +```bash +kubectl -n get networkpolicy +# Pick the policy and check its ingress.from — if no namespaceSelector +# admits `konveyor-forklift`, that is the block. +``` + +Check the controller logs for `Failed to update conversion progress` +or connection-refused / connection-timeout messages against the target +pod IP: + +```bash +kubectl -n konveyor-forklift logs deploy/forklift-controller | \ + grep -E 'conversion progress|virt-v2v' | tail -20 +``` + +A timeout from the controller's IP to the virt-v2v pod's IP confirms +the policy block. A successful `:8080` response after the policy is in +place confirms the fix. diff --git a/docs/en/solutions/Cold_VM_Migration_from_VMware_Hangs_at_Conversion_Progress_Reporting.md b/docs/en/solutions/Cold_VM_Migration_from_VMware_Hangs_at_Conversion_Progress_Reporting.md deleted file mode 100644 index cb33f6b12..000000000 --- a/docs/en/solutions/Cold_VM_Migration_from_VMware_Hangs_at_Conversion_Progress_Reporting.md +++ /dev/null @@ -1,112 +0,0 @@ ---- -kind: - - Troubleshooting -products: - - Alauda Container Platform -ProductsVersion: - - 4.1.0,4.2.x ---- -## Issue - -A cold migration of a VMware VM into ACP Virtualization stalls at the final step. The migration plan sits at "converting…" forever. The `virt-v2v` pod in the target namespace completes its conversion work and opens an HTTP server to report the resulting VM XML, but the migration controller never completes the plan. Representative symptoms: - -- `virt-v2v` pod log ends with `Starting server on:8080`, no crash, no further progress. -- Migration-controller log in the platform-migration namespace reports: - - ```text - msg="Failed to update conversion progress" - error="Get \"http://10.128.1.34:2112/metrics\": dial tcp 10.128.1.34:2112: i/o timeout" - ``` - -## Root Cause - -The VM-from-VMware workflow has two pods in two namespaces: - -- The **migration controller** (Forklift controller) runs in the platform migration namespace (install-specific; on ACP it's the Virtualization migration component's namespace). -- The **virt-v2v + vddk** conversion pod runs in the **target** namespace — where the imported VM will live. - -At the end of conversion, virt-v2v exposes two HTTP endpoints to the controller: - -| Port | Purpose | -|---|---| -| `8080` | serves the produced VM XML (for the controller to read and create the VMI) | -| `2112` | exposes conversion progress metrics | - -If the **target** namespace has a default-deny NetworkPolicy (or any policy that doesn't explicitly admit ingress from the migration-controller namespace), both of those endpoints are unreachable. The controller times out on `:2112/metrics`, the plan never reads the finished XML from `:8080`, and the migration hangs indefinitely. - -Common-case predicate: the target namespace carries a standard "deny-by-default + allow-from-same-namespace + allow-from-ingress + allow-from-monitoring" set of policies — which is a reasonable security baseline but doesn't know about the migration controller. - -## Resolution - -Admit ingress from the migration-controller namespace on the target namespace. Apply only while migrations are active, or keep it as a standing policy if the target namespace routinely receives VM imports. - -1. **Identify the migration-controller namespace.** It carries the virtualization migration controller pod. The exact name depends on the ACP Virtualization install: - - ```bash - # Find the controller pod by label (common selector) - kubectl get pod -A -l app.kubernetes.io/component=forklift-controller -o wide - # Or by deployment name pattern - kubectl get pod -A | grep -E 'forklift.*controller|migration.*controller' - ``` - - Whatever namespace shows up is the one to admit. - -2. **Add the allow-ingress policy in the target namespace.** Pair the namespace selector with a pod selector so only the conversion pod is reachable: - - ```yaml - apiVersion: networking.k8s.io/v1 - kind: NetworkPolicy - metadata: - name: allow-from-migration-controller - namespace: - spec: - podSelector: - matchExpressions: - - key: forklift.konveyor.io/plan - operator: Exists - policyTypes: - - Ingress - ingress: - - from: - - namespaceSelector: - matchLabels: - kubernetes.io/metadata.name: - ports: - - protocol: TCP - port: 8080 - - protocol: TCP - port: 2112 - ``` - - The `podSelector` scopes the rule to the virt-v2v pods only (they carry the `forklift.konveyor.io/plan` label). This keeps the rest of the namespace's default-deny posture intact. - -3. **Re-run the plan.** The controller picks up the next reconcile within ~10s and completes. No need to restart anything. - -4. **For a standing solution**, include the allow-from-migration-controller policy in the namespace template / project template used for VM-import landing zones. Once a team knows their namespace will receive migrations, the policy should be pre-provisioned. - -## Diagnostic Steps - -Confirm the controller is timing out on the metrics endpoint: - -```bash -kubectl -n logs deploy/ --tail=200 \ - | grep -i 'Failed to update conversion progress' -``` - -Confirm virt-v2v reached the "ready-to-serve" state: - -```bash -kubectl -n get pod -l forklift.konveyor.io/plan -o wide -kubectl -n logs -c virt-v2v --tail=20 -``` - -If the log ends with `Starting server on:8080` and no further activity, the conversion is done — the issue is network reachability, not the conversion itself. - -Verify the namespace's inbound policy set: - -```bash -kubectl -n get networkpolicy -kubectl -n describe networkpolicy -``` - -A "deny-all-by-default" rule combined with no explicit allow from the migration-controller namespace matches the failure mode. Apply the NetworkPolicy above; the plan resumes within one reconcile.