net, tests, stuntime: Wait for VMI affinity reconciliation before migration#5347
net, tests, stuntime: Wait for VMI affinity reconciliation before migration#5347Anatw wants to merge 2 commits into
Conversation
📝 WalkthroughWalkthroughThis PR adds VMI affinity polling in the VM helper, then updates L2 bridge and localnet migration stuntime tests to wait for affinity, verify post-migration affinity state, and measure stuntime afterward. ChangesAffinity-aware migration stuntime checks
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsLinked repositories: Your configuration references 1 linked repositories, but your current plan allows 0. Analyzed ``, skipped Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Report bugs in Issues Welcome! 🎉This pull request will be automatically processed with the following features: 🔄 Automatic Actions
📋 Available CommandsPR Status Management
Review & Approval
Testing & Validation
Container Operations
Cherry-pick Operations
Branch Management
Label Management
✅ Merge RequirementsThis PR will be automatically approved when the following conditions are met:
📊 Review ProcessApprovers and ReviewersApprovers:
Reviewers:
Available Labels
AI Features
Security Checks
💡 Tips
For more information, please refer to the project documentation or contact the maintainers. |
|
/wip |
|
/build-and-push-container |
|
New container for quay.io/openshift-cnv/openshift-virtualization-tests:pr-5347 published |
d503c68 to
31df3e8
Compare
c619644 to
56a1e16
Compare
|
Change: Remove trailing underscore from constants. |
| expected_affinity = ( | ||
| asdict(obj=template_affinity, dict_factory=self._filter_out_none_values) if template_affinity else None | ||
| ) | ||
| for sample in TimeoutSampler( |
There was a problem hiding this comment.
Consider using a helper with @retry
| has_stuntime_rules = any( | ||
| expr.get("key", "").startswith("stuntime.") | ||
| for rule in pod_affinity[stale_type].get("requiredDuringSchedulingIgnoredDuringExecution", []) | ||
| for expr in rule.get("labelSelector", {}).get("matchExpressions", []) | ||
| ) |
There was a problem hiding this comment.
The rules are a bit unreadable
Consider splitting it to vars
There was a problem hiding this comment.
Ack, let me know if this is better or I should split further.
|
/retest all Auto-triggered: Files in this PR were modified by merged PR #5285. Overlapping fileslibs/vm/vm.py |
virt-controller and VMIM controller reconcile independently — if migration is triggered before the VM controller propagates the template affinity to the VMI, the target pod is created with stale scheduling rules. Wait for the VMI to reflect the updated affinity before migrating, and assert post-migration node placement matches the expected affinity. Assisted-by: Claude <noreply@anthropic.com> Signed-off-by: Anat Wax <awax@redhat.com>
56a1e16 to
f5ce311
Compare
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py`:
- Around line 83-86: The cleanup for the `active_ping` session in
`measure_stuntime()` is only happening after the Jira-gated
`assert_affinity_after_migration()` path, so a failure there can leave the ping
running. Move the ping stop/report logic out of the normal flow and into a
`finally` block or fixture teardown around the migration/assertion sequence in
the stuntime migration tests so it always runs, even when
`assert_affinity_after_migration()` fails. Apply the same pattern in the related
`test_migration_stuntime` scenarios that use `wait_for_vmi_affinity()`,
`migrate_vm_and_verify()`, and `measure_stuntime()`.
In `@tests/network/libs/stuntime.py`:
- Around line 158-160: The membership check in the pod affinity assertion can
crash when vm.vmi.virt_launcher_pod.instance.to_dict()["spec"].get("affinity")
returns None, so update the diagnostic logic in this helper to handle the
missing-affinity case before checking expected_type in pod_affinity. Use a
pytest assertion in stuntime.py that explicitly verifies affinity is present and
then checks for the expected and stale types, with a clear failure message that
explains which affinity state was missing or unexpected.
In `@tests/network/localnet/migration_stuntime/test_migration_stuntime.py`:
- Around line 83-86: The migration stuntime tests are asserting affinity before
the ping session is guaranteed to be stopped, so a failure can leave active_ping
running and leak state into later migrations. Update the affected test blocks
around migrate_vm_and_verify(), assert_affinity_after_migration(), and
measure_stuntime() so ping cleanup is unconditional, using a finally block or
fixture teardown to stop/report the ping even if the affinity assertion fails.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: c139bd34-f5f3-4fab-aba9-96a051950522
📒 Files selected for processing (4)
libs/vm/vm.pytests/network/l2_bridge/migration_stuntime/test_migration_stuntime.pytests/network/libs/stuntime.pytests/network/localnet/migration_stuntime/test_migration_stuntime.py
azhivovk
left a comment
There was a problem hiding this comment.
Thanks
I have a few questions
|
/verified Tested on bm03-tlv2 uv run pytest -s --skip-deprecated-api-test --pdbcls=IPython.terminal.debugger:TerminalPdb --tc-format=python --cluster-sanity-skip-storage-check -m "tier3" tests/network/localnet/migration_stuntime/test_migration_stuntime.py tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py |
Test execution plan request detailsCRITICAL: You MUST post an inline review comment on the first changed line of the first file. As an expert software testing engineer, analyze all modified files in this PR and create a targeted test execution plan. Analysis Requirements:
Output rules: Your deliverable: Test Execution Plan
Use these formats:
Real test commands (MANDATORY when changes affect session/runtime code): When the affected code runs at session/collection time (conftest fixtures, pytest plugins,
Example output for a session-startup fix: Guidelines:
Hardware-Related Checks (SR-IOV, GPU, DPDK): When PR modifies fixtures for hardware-specific resources:
CRITICAL WORKFLOW COMPLETION RULES: When responding to this test execution plan request, you MUST follow these rules EXACTLY:
ABSOLUTE PROHIBITIONS (violating these creates empty/meaningless reviews):
Remember: The pull request review is visible to users. Additional comments are redundant noise. CRITICAL — Verification rules for smoke and gating decisions: |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
repo="RedHatQE/openshift-virtualization-tests"
pr="5347"
echo "== PR head SHA =="
head_sha=$(gh pr view "$pr" --repo "$repo" --json headRefOid --jq '.headRefOid')
echo "$head_sha"
echo
echo "== Changed files in order =="
mapfile -t pr_files < <(gh pr view "$pr" --repo "$repo" --json files --jq '.files[].path')
printf '%s\n' "${pr_files[@]}"
first_file="${pr_files[0]}"
echo
echo "== First changed file =="
echo "$first_file"
echo
echo "== Patch for first changed file (head) =="
gh api "repos/$repo/pulls/$pr/files?per_page=100" --jq '.[] | select(.filename=="'"$first_file"'") | .patch'
echo
echo "== Current diff for changed files =="
for f in "${pr_files[@]}"; do
echo "--- $f ---"
git diff --unified=3 "origin/$(git rev-parse --abbrev-ref HEAD)~1" -- "$f" 2>/dev/null || git diff --unified=3 HEAD~1 -- "$f" || true
done
echo
echo "== Outline for candidate library/shared files =="
for f in libs/vm/vm.py tests/network/libs/stuntime.py; do
if [ -f "$f" ]; then
echo "--- $f ---"
ast-grep outline "$f" --view expanded | sed -n '1,240p'
fi
done
echo
echo "== Modified function/method definitions in libs/utilities/shared files =="
for f in libs/vm/vm.py tests/network/libs/stuntime.py; do
if [ -f "$f" ]; then
echo "--- $f ---"
git diff --unified=0 HEAD~1 -- "$f" | rg '^[+-]\s*(def |class )' || true
fi
done
echo
echo "== Direct callers/imports in tests and conftest =="
rg -n -C2 '\bwait_for_vmi_affinity\s*\(|\bassert_affinity_after_migration\s*\(|\bset_template_affinity\s*\(' tests tests/**/conftest.py || true
echo
echo "== Same-file callers in changed library/shared files =="
for f in libs/vm/vm.py tests/network/libs/stuntime.py; do
if [ -f "$f" ]; then
echo "--- $f ---"
rg -n -C2 'wait_for_vmi_affinity|assert_affinity_after_migration|_assert_vmi_affinity|_filter_out_none_values' "$f" || true
fi
done
echo
echo "== Changed test files: markers and test defs =="
for f in tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py tests/network/localnet/migration_stuntime/test_migration_stuntime.py; do
if [ -f "$f" ]; then
echo "--- $f markers ---"
rg -n '`@pytest`\.mark\.(smoke|gating|tier[0-9]+|polarion|parametrize)' "$f" || true
echo "--- $f tests ---"
rg -n '^\s*def test_' "$f" || true
fi
done
echo
echo "== Smoke-marked files =="
rg -l '`@pytest`\.mark\.smoke' tests || true
echo
echo "== Gating-marked files =="
rg -l '`@pytest`\.mark\.gating' tests || true
echo
echo "== Parent conftest references to modified symbols =="
rg -n -C2 'wait_for_vmi_affinity|assert_affinity_after_migration|set_template_affinity' tests/**/conftest.py tests/conftest.py || true
echo
echo "== First changed line in first changed file =="
first_line=$(
gh api "repos/$repo/pulls/$pr/files?per_page=100" --jq '.[] | select(.filename=="'"$first_file"'") | .patch' \
| python - <<'PY'
import sys,re
patch=sys.stdin.read().splitlines()
cur=None
for line in patch:
m=re.match(r'^@@ -\d+(?:,\d+)? \+(\d+)(?:,\d+)? @@', line)
if m:
cur=int(m.group(1))
continue
if cur is None:
continue
if line.startswith('+') and not line.startswith('+++'):
print(cur)
break
if line.startswith('-') and not line.startswith('---'):
continue
cur += 1
PY
)
echo "$first_line"
body="$(cat <<'EOF'
**Test Execution Plan**
- **Run smoke tests: False** — no verified dependency path from any `@pytest.mark.smoke` test or parent `conftest.py` to `libs/vm/vm.py::BaseVirtualMachine.wait_for_vmi_affinity()` or `tests/network/libs/stuntime.py::assert_affinity_after_migration()`.
- **Run gating tests: False** — no verified dependency path from any `@pytest.mark.gating` test to the changed symbols.
- **Affected tests to run**
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_off_server_node` (tier3)
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_to_server_node` (tier3)
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_off_client_node` (tier3)
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_to_client_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_off_server_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_to_server_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_off_client_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_to_client_node` (tier3)
These are the concrete affected paths because the changed tests now call `BaseVirtualMachine.wait_for_vmi_affinity()` directly after `set_template_affinity()`, and they conditionally call `tests/network/libs/stuntime.py::assert_affinity_after_migration()` after `migrate_vm_and_verify()`.
**Real tests (cluster required)**
Error path (the fix):
`pytest -s -o log_cli=true -m tier3 --jira tests/network/localnet/migration_stuntime/test_migration_stuntime.py -k 'test_client_migrates_off_server_node or test_server_migrates_off_client_node'`
Expected: after `set_template_affinity()`, the VMI affinity converges before migration, the migrated VM lands with matching anti-affinity rules, and stuntime stays within `STUNTIME_THRESHOLD_SECONDS`.
Happy path (regression):
`pytest -s -o log_cli=true -m tier3 --jira tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py -k 'test_client_migrates_to_server_node or test_server_migrates_to_client_node'`
Expected: affinity-targeted migrations still succeed, the target pod reflects the expected affinity type when `CNV-90576` is open, and stuntime stays within `STUNTIME_THRESHOLD_SECONDS`.
EOF
)"
echo
echo "== Attempt inline review comment posting =="
set +e
inline_ok=0
for attempt in 1 2 3; do
echo "-- inline attempt $attempt --"
gh api \
--method POST \
"repos/$repo/pulls/$pr/comments" \
-f body="$body" \
-f commit_id="$head_sha" \
-f path="$first_file" \
-F line="$first_line" \
-f side="RIGHT"
rc=$?
echo "inline_rc=$rc"
if [ $rc -eq 0 ]; then
inline_ok=1
break
fi
sleep 2
done
if [ $inline_ok -ne 1 ]; then
echo
echo "== Inline failed; attempt fallback regular PR comment =="
gh api \
--method POST \
"repos/$repo/issues/$pr/comments" \
-f body="$body"
echo "fallback_rc=$?"
fi
set -eLength of output: 7215 🏁 Script executed: #!/bin/bash
set -euo pipefail
repo="RedHatQE/openshift-virtualization-tests"
pr="5347"
tmpdir="$(mktemp -d)"
trap 'rm -rf "$tmpdir"' EXIT
echo "== PR head SHA =="
head_sha="$(gh pr view "$pr" --repo "$repo" --json headRefOid --jq '.headRefOid')"
echo "$head_sha"
echo
echo "== Changed files in order =="
gh pr view "$pr" --repo "$repo" --json files --jq '.files[].path' > "$tmpdir/pr_files.txt"
cat "$tmpdir/pr_files.txt"
first_file="$(sed -n '1p' "$tmpdir/pr_files.txt")"
echo
echo "== First changed file =="
echo "$first_file"
echo
echo "== Patch for first changed file (head) =="
gh api "repos/$repo/pulls/$pr/files?per_page=100" --jq '.[] | select(.filename=="'"$first_file"'") | .patch' > "$tmpdir/first_patch.txt"
cat "$tmpdir/first_patch.txt"
echo
echo "== Current diff for changed files =="
while IFS= read -r f; do
echo "--- $f ---"
git diff --unified=3 HEAD~1 -- "$f" || true
done < "$tmpdir/pr_files.txt"
echo
echo "== Outline for candidate library/shared files =="
for f in libs/vm/vm.py tests/network/libs/stuntime.py; do
if [ -f "$f" ]; then
echo "--- $f ---"
ast-grep outline "$f" --view expanded | sed -n '1,240p'
fi
done
echo
echo "== Modified function/method definitions in libs/utilities/shared files =="
for f in libs/vm/vm.py tests/network/libs/stuntime.py; do
if [ -f "$f" ]; then
echo "--- $f ---"
git diff --unified=0 HEAD~1 -- "$f" | rg '^[+-]\s*(def |class )' || true
fi
done
echo
echo "== Direct callers/imports in tests and conftest =="
rg -n -C2 '\bwait_for_vmi_affinity\s*\(|\bassert_affinity_after_migration\s*\(|\bset_template_affinity\s*\(' tests tests/**/conftest.py || true
echo
echo "== Same-file callers in changed library/shared files =="
for f in libs/vm/vm.py tests/network/libs/stuntime.py; do
if [ -f "$f" ]; then
echo "--- $f ---"
rg -n -C2 'wait_for_vmi_affinity|assert_affinity_after_migration|_assert_vmi_affinity|_filter_out_none_values' "$f" || true
fi
done
echo
echo "== Changed test files: markers and test defs =="
for f in tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py tests/network/localnet/migration_stuntime/test_migration_stuntime.py; do
if [ -f "$f" ]; then
echo "--- $f markers ---"
rg -n '`@pytest`\.mark\.(smoke|gating|tier[0-9]+|polarion|parametrize)' "$f" || true
echo "--- $f tests ---"
rg -n '^\s*def test_' "$f" || true
fi
done
echo
echo "== Smoke-marked files =="
rg -l '`@pytest`\.mark\.smoke' tests || true
echo
echo "== Gating-marked files =="
rg -l '`@pytest`\.mark\.gating' tests || true
echo
echo "== Parent conftest references to modified symbols =="
rg -n -C2 'wait_for_vmi_affinity|assert_affinity_after_migration|set_template_affinity' tests/**/conftest.py tests/conftest.py || true
echo
echo "== First changed line in first changed file =="
python - "$tmpdir/first_patch.txt" <<'PY' > "$tmpdir/first_line.txt"
import re, sys
patch = open(sys.argv[1], encoding="utf-8").read().splitlines()
cur = None
for line in patch:
m = re.match(r'^@@ -\d+(?:,\d+)? \+(\d+)(?:,\d+)? @@', line)
if m:
cur = int(m.group(1))
continue
if cur is None:
continue
if line.startswith('+') and not line.startswith('+++'):
print(cur)
break
if line.startswith('-') and not line.startswith('---'):
continue
cur += 1
PY
first_line="$(cat "$tmpdir/first_line.txt")"
echo "$first_line"
cat > "$tmpdir/body.md" <<'EOF'
**Test Execution Plan**
- **Run smoke tests: False** — no verified dependency path from any `@pytest.mark.smoke` test or parent `conftest.py` to `libs/vm/vm.py::BaseVirtualMachine.wait_for_vmi_affinity()` or `tests/network/libs/stuntime.py::assert_affinity_after_migration()`.
- **Run gating tests: False** — no verified dependency path from any `@pytest.mark.gating` test to the changed symbols.
- **Affected tests to run**
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_off_server_node` (tier3)
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_to_server_node` (tier3)
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_off_client_node` (tier3)
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_to_client_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_off_server_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_to_server_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_off_client_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_to_client_node` (tier3)
These are the concrete affected paths because the changed tests now call `BaseVirtualMachine.wait_for_vmi_affinity()` directly after `set_template_affinity()`, and they conditionally call `tests/network/libs/stuntime.py::assert_affinity_after_migration()` after `migrate_vm_and_verify()`.
**Real tests (cluster required)**
Error path (the fix):
`pytest -s -o log_cli=true -m tier3 --jira tests/network/localnet/migration_stuntime/test_migration_stuntime.py -k 'test_client_migrates_off_server_node or test_server_migrates_off_client_node'`
Expected: after `set_template_affinity()`, the VMI affinity converges before migration, the migrated VM lands with matching anti-affinity rules, and stuntime stays within `STUNTIME_THRESHOLD_SECONDS`.
Happy path (regression):
`pytest -s -o log_cli=true -m tier3 --jira tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py -k 'test_client_migrates_to_server_node or test_server_migrates_to_client_node'`
Expected: affinity-targeted migrations still succeed, the target pod reflects the expected affinity type when `CNV-90576` is open, and stuntime stays within `STUNTIME_THRESHOLD_SECONDS`.
EOF
body="$(cat "$tmpdir/body.md")"
echo
echo "== Attempt inline review comment posting =="
set +e
inline_ok=0
for attempt in 1 2 3; do
echo "-- inline attempt $attempt --"
gh api \
--method POST \
"repos/$repo/pulls/$pr/comments" \
-f body="$body" \
-f commit_id="$head_sha" \
-f path="$first_file" \
-F line="$first_line" \
-f side="RIGHT"
rc=$?
echo "inline_rc=$rc"
if [ "$rc" -eq 0 ]; then
inline_ok=1
break
fi
sleep 2
done
if [ "$inline_ok" -ne 1 ]; then
echo
echo "== Inline failed; attempt fallback regular PR comment =="
gh api \
--method POST \
"repos/$repo/issues/$pr/comments" \
-f body="$body"
echo "fallback_rc=$?"
fi
set -eLength of output: 46117
CRITICAL: I retried the required workflow on the latest PR head ( Why this matters: your workflow requires a non-blocking inline PR comment on Posting attempts
Here is the exact comment body to post manually on **Test Execution Plan**
- **Run smoke tests: False** — no verified dependency path from any `@pytest.mark.smoke` test or parent `conftest.py` to `libs/vm/vm.py::BaseVirtualMachine.wait_for_vmi_affinity()` or `tests/network/libs/stuntime.py::assert_affinity_after_migration()`.
- **Run gating tests: False** — no verified dependency path from any `@pytest.mark.gating` test to the changed symbols.
- **Affected tests to run**
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_off_server_node` (tier3)
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_to_server_node` (tier3)
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_off_client_node` (tier3)
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_to_client_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_off_server_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_to_server_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_off_client_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_to_client_node` (tier3)
These are the concrete affected paths because the changed tests now call `BaseVirtualMachine.wait_for_vmi_affinity()` directly after `set_template_affinity()`, and they conditionally call `tests/network/libs/stuntime.py::assert_affinity_after_migration()` after `migrate_vm_and_verify()`.
**Real tests (cluster required)**
Error path (the fix):
`pytest -s -o log_cli=true -m tier3 --jira tests/network/localnet/migration_stuntime/test_migration_stuntime.py -k 'test_client_migrates_off_server_node or test_server_migrates_off_client_node'`
Expected: after `set_template_affinity()`, the VMI affinity converges before migration, the migrated VM lands with matching anti-affinity rules, and stuntime stays within `STUNTIME_THRESHOLD_SECONDS`.
Happy path (regression):
`pytest -s -o log_cli=true -m tier3 --jira tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py -k 'test_client_migrates_to_server_node or test_server_migrates_to_client_node'`
Expected: affinity-targeted migrations still succeed, the target pod reflects the expected affinity type when `CNV-90576` is open, and stuntime stays within `STUNTIME_THRESHOLD_SECONDS`. |
azhivovk
left a comment
There was a problem hiding this comment.
Hopefully it's okay to apply W/A for the stuntime measruements
| self._assert_vmi_affinity(expected_affinity=expected_affinity) | ||
|
|
||
| @retry(wait_timeout=10, sleep=1) | ||
| def _assert_vmi_affinity(self, expected_affinity: dict[str, Any] | None) -> bool: |
There was a problem hiding this comment.
NIT:
Since we use assert for tests, consider renaming to expected_vmi_affinity
There was a problem hiding this comment.
+1 from me, let not mix up concerns. assert is about test itself, if you have to check some state in precondition/step, please use any other way/naming, not assert.
| @retry(wait_timeout=10, sleep=1) | ||
| def _assert_vmi_affinity(self, expected_affinity: dict[str, Any] | None) -> bool: | ||
| vmi_affinity = self.vmi.instance.to_dict()["spec"].get("affinity") | ||
| assert vmi_affinity == expected_affinity, ( |
There was a problem hiding this comment.
see thread. assert is a test concern, not library. Raise a clear exception instead.
| return stuntime | ||
|
|
||
|
|
||
| def assert_affinity_after_migration(vm: BaseVirtualMachine, expected_type: str) -> None: |
There was a problem hiding this comment.
if this is not a test itself, rename the helper and don't use assert. I bet we don't test affinity after migration, hence it does not deserve assertion. Otherwise, it is a mess.
| localnet_stuntime_client_vm.set_template_affinity(affinity=new_pod_anti_affinity(label=SERVER_VM_LABEL)) | ||
| localnet_stuntime_client_vm.wait_for_vmi_affinity() | ||
| migrate_vm_and_verify(vm=localnet_stuntime_client_vm, client=admin_client) | ||
| if is_jira_open(jira_id="CNV-90576"): |
There was a problem hiding this comment.
btw don't we need to check affinity even without this bug? I mean, perhaps the direction is right regardless the bug -> checking jira is not needed?
| patches = {self: {"spec": {"template": {"spec": {"affinity": template_affinity}}}}} | ||
| ResourceEditor(patches=patches).update() | ||
|
|
||
| def wait_for_vmi_affinity(self) -> None: |
There was a problem hiding this comment.
per my understanding, the API contract of "set affinity" should mean "affinity is applied when I return." Hence the wait operation should be a part if set_template_affinity. Otherwise, every caller must remember the wait, and future callers may forget. Consider integrating the reconciliation wait into set_template_affinity itself.
What this PR does / why we need it:
When a VM template affinity is updated and migration is triggered immediately after, virt-controller and the VMIM controller race: the VMIM controller may read the VMI before the VM controller has reconciled the template change, creating the migration target pod with stale affinity rules. The VM ends up on the wrong node.
This PR adds
wait_for_vmi_affinity()toBaseVirtualMachine, which polls the VMI until its affinity matches the VM template — ensuring the VM controller has reconciled before migration is triggered. All stuntime tests (L2 bridge + localnet) now call it afterset_template_affinity.Additionally, temporary post-migration affinity assertions are added (gated behind
is_jira_open("CNV-90576")) to detect the race if it still occurs despite the wait. These assertions will be removed once CNV-90576 is resolved.The first commit removes a redundant
set_template_affinitycall fromtest_server_migrates_between_non_client_nodes— the preceding test already sets the same affinity and@pytest.mark.incrementalpreserves it.Which issue(s) this PR fixes:
Special notes for reviewer:
jira-ticket:
NONE
Summary by CodeRabbit
Summary by CodeRabbit