Skip to content

net, tests, stuntime: Wait for VMI affinity reconciliation before migration#5347

Open
Anatw wants to merge 2 commits into
RedHatQE:mainfrom
Anatw:stuntime_fix_affinity_issue
Open

net, tests, stuntime: Wait for VMI affinity reconciliation before migration#5347
Anatw wants to merge 2 commits into
RedHatQE:mainfrom
Anatw:stuntime_fix_affinity_issue

Conversation

@Anatw

@Anatw Anatw commented Jun 23, 2026

Copy link
Copy Markdown
Contributor
What this PR does / why we need it:

When a VM template affinity is updated and migration is triggered immediately after, virt-controller and the VMIM controller race: the VMIM controller may read the VMI before the VM controller has reconciled the template change, creating the migration target pod with stale affinity rules. The VM ends up on the wrong node.

This PR adds wait_for_vmi_affinity() to BaseVirtualMachine, which polls the VMI until its affinity matches the VM template — ensuring the VM controller has reconciled before migration is triggered. All stuntime tests (L2 bridge + localnet) now call it after set_template_affinity.

Additionally, temporary post-migration affinity assertions are added (gated behind is_jira_open("CNV-90576")) to detect the race if it still occurs despite the wait. These assertions will be removed once CNV-90576 is resolved.

The first commit removes a redundant set_template_affinity call from test_server_migrates_between_non_client_nodes — the preceding test already sets the same affinity and @pytest.mark.incremental preserves it.

Which issue(s) this PR fixes:
Special notes for reviewer:
jira-ticket:

NONE

Summary by CodeRabbit

Summary by CodeRabbit

  • Tests
    • Strengthened VM migration stuntime coverage by adding explicit VMI affinity/anti-affinity verification before performance measurement.
    • Added helper assertions to confirm migrated VMs land with the expected pod affinity type and to ensure stale/incorrect affinity rules aren’t present.
    • Applied the improved validation flow across both Linux bridge and OVN localnet migration scenarios, with scenario-specific adjustments for anti-affinity setups.

@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds VMI affinity polling in the VM helper, then updates L2 bridge and localnet migration stuntime tests to wait for affinity, verify post-migration affinity state, and measure stuntime afterward.

Changes

Affinity-aware migration stuntime checks

Layer / File(s) Summary
VMI affinity wait helper
libs/vm/vm.py
BaseVirtualMachine.wait_for_vmi_affinity() now retries an exact comparison between VMI spec.affinity and the VM template affinity.
Stuntime affinity helpers
tests/network/libs/stuntime.py
Adds pod affinity constants and a helper that checks migrated virt-launcher pod affinity and stale stuntime. match expressions.
L2 bridge migration assertions
tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py
Updates migration scenarios to wait for VMI affinity, migrate, conditionally assert post-migration affinity, and then measure stuntime; one scenario removes an anti-affinity setup line.
Localnet migration assertions
tests/network/localnet/migration_stuntime/test_migration_stuntime.py
Applies the same wait-then-migrate-then-affinity-check sequence across localnet scenarios before stuntime checks, with expanded imports for the new helper and constants.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • EdDev
  • orelmisan
  • frenzyfriday
  • azhivovk
  • nirdothan
  • dshchedr
  • RoniKishner
  • rnetser
  • vsibirsk
  • yossisegev
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title is under 120 characters and clearly summarizes the main change: waiting for VMI affinity before migration.
Description check ✅ Passed The description matches the required template, explains the change, and sets jira-ticket to NONE; only non-critical sections are left blank.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stp Link Required ✅ Passed No new test files or test functions were added; only existing test bodies changed, and both modules already have STP links in their top docstrings.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Linked repositories: Your configuration references 1 linked repositories, but your current plan allows 0. Analyzed ``, skipped RedHatQE/openshift-virtualization-tests-design-docs.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@openshift-virtualization-qe-bot-6

Copy link
Copy Markdown

Report bugs in Issues

Welcome! 🎉

This pull request will be automatically processed with the following features:

🔄 Automatic Actions

  • Reviewer Assignment: Reviewers are automatically assigned based on the OWNERS file in the repository root
  • Size Labeling: PR size labels (XS, S, M, L, XL, XXL) are automatically applied based on changes
  • Issue Creation: A tracking issue is created for this PR and will be closed when the PR is merged or closed
  • Branch Labeling: Branch-specific labels are applied to track the target branch
  • Auto-verification: Auto-verified users have their PRs automatically marked as verified
  • Labels: Enabled categories: branch, can-be-merged, cherry-pick, has-conflicts, hold, needs-rebase, size, verified, wip

📋 Available Commands

PR Status Management

  • /wip - Mark PR as work in progress (adds WIP: prefix to title)
  • /wip cancel - Remove work in progress status
  • /hold - Block PR merging (approvers only)
  • /hold cancel - Unblock PR merging
  • /verified - Mark PR as verified
  • /verified cancel - Remove verification status
  • /reprocess - Trigger complete PR workflow reprocessing (useful if webhook failed or configuration changed)
  • /regenerate-welcome - Regenerate this welcome message
  • /security-override - Set security check runs to pass (maintainers only)
  • /security-override cancel - Re-run security checks

Review & Approval

  • /lgtm - Approve changes (looks good to me)
  • /approve - Approve PR (approvers only)
  • /assign-reviewers - Assign reviewers based on OWNERS file
  • /assign-reviewer @username - Assign specific reviewer
  • /check-can-merge - Check if PR meets merge requirements

Testing & Validation

  • /retest tox - Run Python test suite with tox
  • /retest build-container - Rebuild and test container image
  • /retest verify-bugs-are-open - verify-bugs-are-open
  • /retest all - Run all available tests

Container Operations

  • /build-and-push-container - Build and push container image (tagged with PR number)
    • Supports additional build arguments: /build-and-push-container --build-arg KEY=value

Cherry-pick Operations

  • /cherry-pick <branch> - Schedule cherry-pick to target branch when PR is merged
    • Multiple branches: /cherry-pick branch1 branch2 branch3
  • /cherry-pick-retry <branch> - Retry a failed cherry-pick (merged PRs only)

Branch Management

  • /rebase - Rebase this PR branch onto its base branch

Label Management

  • /<label-name> - Add a label to the PR
  • /<label-name> cancel - Remove a label from the PR

✅ Merge Requirements

This PR will be automatically approved when the following conditions are met:

  1. Approval: /approve from at least one approver
  2. LGTM Count: Minimum 2 /lgtm from reviewers
  3. Status Checks: All required status checks must pass
  4. No Blockers: No wip, hold, has-conflicts labels and PR must be mergeable (no conflicts)
  5. Verified: PR must be marked as verified

📊 Review Process

Approvers and Reviewers

Approvers:

  • EdDev
  • dshchedr
  • myakove
  • rnetser
  • vsibirsk

Reviewers:

  • Anatw
  • EdDev
  • RoniKishner
  • azhivovk
  • dshchedr
  • frenzyfriday
  • nirdothan
  • orelmisan
  • rnetser
  • servolkov
  • vsibirsk
  • yossisegev
Available Labels
  • hold
  • verified
  • wip
  • lgtm
  • approve
AI Features
  • Cherry-Pick Conflict Resolution: Enabled (claude/claude-opus-4-6-1m)
Security Checks
  • Suspicious Path Detection: Monitors paths: .claude/, .vscode/, .cursor/, .devcontainer/, .pi/, .github/workflows/, .github/actions/
  • Committer Identity Check: Verifies last committer matches PR author
  • Mandatory: Security checks block merge (use /security-override to bypass — maintainers only)

💡 Tips

  • WIP Status: Use /wip when your PR is not ready for review
  • Verification: The verified label is removed on new commits unless the push is detected as a clean rebase
  • Cherry-picking: Cherry-pick labels are processed when the PR is merged
  • Container Builds: Container images are automatically tagged with the PR number
  • Permission Levels: Some commands require approver permissions
  • Auto-verified Users: Certain users have automatic verification and merge privileges

For more information, please refer to the project documentation or contact the maintainers.

@Anatw

Anatw commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

/wip

@openshift-virtualization-qe-bot-4 openshift-virtualization-qe-bot-4 changed the title net, tests, stuntime: Wait for VMI affinity reconciliation before migration WIP: net, tests, stuntime: Wait for VMI affinity reconciliation before migration Jun 23, 2026
@Anatw

Anatw commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

/build-and-push-container

@openshift-virtualization-qe-bot-4

Copy link
Copy Markdown

New container for quay.io/openshift-cnv/openshift-virtualization-tests:pr-5347 published

@Anatw Anatw force-pushed the stuntime_fix_affinity_issue branch from c619644 to 56a1e16 Compare June 29, 2026 07:23
@Anatw

Anatw commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

Change: Remove trailing underscore from constants.

@yossisegev yossisegev left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Anat

Comment thread libs/vm/vm.py
Comment thread libs/vm/vm.py Outdated
expected_affinity = (
asdict(obj=template_affinity, dict_factory=self._filter_out_none_values) if template_affinity else None
)
for sample in TimeoutSampler(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using a helper with @retry

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack.

Comment thread tests/network/libs/stuntime.py Outdated
Comment on lines +164 to +168
has_stuntime_rules = any(
expr.get("key", "").startswith("stuntime.")
for rule in pod_affinity[stale_type].get("requiredDuringSchedulingIgnoredDuringExecution", [])
for expr in rule.get("labelSelector", {}).get("matchExpressions", [])
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rules are a bit unreadable
Consider splitting it to vars

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, let me know if this is better or I should split further.

@openshift-virtualization-qe-bot-3

Copy link
Copy Markdown
Contributor

/retest all

Auto-triggered: Files in this PR were modified by merged PR #5285.

Overlapping files

libs/vm/vm.py

virt-controller and VMIM controller reconcile independently — if
migration is triggered before the VM controller propagates the template
affinity to the VMI, the target pod is created with stale scheduling
rules.

Wait for the VMI to reflect the updated affinity before migrating, and
assert post-migration node placement matches the expected affinity.

Assisted-by: Claude <noreply@anthropic.com>
Signed-off-by: Anat Wax <awax@redhat.com>
@Anatw

Anatw commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

Changes:

  • set_template_affinity - Replace TieoutSampler with Retry.
  • assert_affinity_after_migration - split into variables.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py`:
- Around line 83-86: The cleanup for the `active_ping` session in
`measure_stuntime()` is only happening after the Jira-gated
`assert_affinity_after_migration()` path, so a failure there can leave the ping
running. Move the ping stop/report logic out of the normal flow and into a
`finally` block or fixture teardown around the migration/assertion sequence in
the stuntime migration tests so it always runs, even when
`assert_affinity_after_migration()` fails. Apply the same pattern in the related
`test_migration_stuntime` scenarios that use `wait_for_vmi_affinity()`,
`migrate_vm_and_verify()`, and `measure_stuntime()`.

In `@tests/network/libs/stuntime.py`:
- Around line 158-160: The membership check in the pod affinity assertion can
crash when vm.vmi.virt_launcher_pod.instance.to_dict()["spec"].get("affinity")
returns None, so update the diagnostic logic in this helper to handle the
missing-affinity case before checking expected_type in pod_affinity. Use a
pytest assertion in stuntime.py that explicitly verifies affinity is present and
then checks for the expected and stale types, with a clear failure message that
explains which affinity state was missing or unexpected.

In `@tests/network/localnet/migration_stuntime/test_migration_stuntime.py`:
- Around line 83-86: The migration stuntime tests are asserting affinity before
the ping session is guaranteed to be stopped, so a failure can leave active_ping
running and leak state into later migrations. Update the affected test blocks
around migrate_vm_and_verify(), assert_affinity_after_migration(), and
measure_stuntime() so ping cleanup is unconditional, using a finally block or
fixture teardown to stop/report the ping even if the affinity assertion fails.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c139bd34-f5f3-4fab-aba9-96a051950522

📥 Commits

Reviewing files that changed from the base of the PR and between d503c68 and f5ce311.

📒 Files selected for processing (4)
  • libs/vm/vm.py
  • tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py
  • tests/network/libs/stuntime.py
  • tests/network/localnet/migration_stuntime/test_migration_stuntime.py

Comment thread tests/network/libs/stuntime.py

@azhivovk azhivovk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks
I have a few questions

Comment thread libs/vm/vm.py
Comment thread tests/network/libs/stuntime.py
@Anatw

Anatw commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

/verified

Tested on bm03-tlv2
Openshift version: 4.22.0-rc.2
CNV version: 4.22.0
HCO image: brew.registry.redhat.io/rh-osbs/iib:1158778

uv run pytest -s --skip-deprecated-api-test --pdbcls=IPython.terminal.debugger:TerminalPdb --tc-format=python --cluster-sanity-skip-storage-check -m "tier3" tests/network/localnet/migration_stuntime/test_migration_stuntime.py tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py
==== 24 passed, 1 deselected, 237 warnings in 2642.62s (0:44:02) ====

@openshift-virtualization-qe-bot-3

Copy link
Copy Markdown
Contributor

@coderabbitai

Test execution plan request details

CRITICAL: You MUST post an inline review comment on the first changed line of the first file.
The inline comment should contain the full Test Execution Plan (smoke decision, gating decision, and specific affected tests).
Do NOT submit a blocking review event (REQUEST_CHANGES/APPROVE).
Post a single inline PR comment on Files Changed (non-blocking COMMENT flow).

As an expert software testing engineer, analyze all modified files in this PR and create a targeted test execution plan.
You will post an inline review comment with the test execution plan on the first changed file.
If you fail to run or post a comment, retry.

Analysis Requirements:

  1. Examine code changes in each modified file

  2. Identify affected code paths, functions, and classes

  3. Analyze pytest-specific elements: fixtures (scope, dependencies), parametrization, markers, conftest changes

  4. Trace test dependencies through imports, shared utilities, fixture inheritance, fixture teardown, and yield from cleanup in conftest

  5. Detect new tests introduced in the PR

  6. Utilities and libs impact (when utilities/ or libs/ changes):
    You MUST use shell scripts (rg, git diff) to trace the full impact.
    Follow these sub-steps in order:

    6a. Identify modified symbols: For each changed file under utilities/ or libs/,
    list every modified function or method.
    Example: git diff HEAD~1 --unified=0 -- utilities/hco.py | grep '^[+-]def '

    6b. Find direct callers: Search tests and conftest for each symbol from 6a.
    Example: rg -l 'get_hco_version' tests/

    6c. Trace fixture teardown and cleanup: Find fixtures that reach
    the modified symbol through yield from or context-manager wrappers.
    Example: rg -l 'yield from.*enable_common_boot|def.*enable_common_boot' tests/

    6d. Trace same-file callers: In each changed file, find other functions
    whose body calls a modified symbol (including code after yield
    in @contextmanager helpers).
    Example: rg 'get_hco_version|enable_common_boot' utilities/hco.py

    6e. Expand transitively: If function A calls modified B, then
    tests/fixtures that call A are affected — even when the test body
    never imports B directly.

    Do NOT limit impact to tests that import the modified symbol only.

  7. Smoke test impact: Intersect the affected set from step 6 with smoke-marked tests.
    Run: rg -l '@pytest.mark.smoke' tests/
    VERIFY the above command returned actual file paths before concluding False.
    Set True if either condition is met:

    • a smoke-marked file appears in the affected set from 6b-6e, OR
    • any conftest.py in the smoke test's parent-directory hierarchy (up to repo root)
      imports or calls a modified utilities/libs symbol — including autouse fixtures
      that depend on modified functions. ALL tests in that directory and below are affected.
      Example check: for each smoke_file, scan dirname(smoke_file)/conftest.py,
      dirname(dirname(smoke_file))/conftest.py, etc. for modified symbol imports
      and autouse fixtures that depend on modified symbols.
  8. Gating test impact: Intersect the affected set from step 6 with gating-marked tests.
    Run: rg -l '@pytest.mark.gating' tests/
    Set True if a gating-marked file also appears in the affected set from 6b-6e.
    Utilities/libs changes often affect gating tests without affecting smoke tests.
    Do NOT stop analysis after concluding Run smoke tests: False.

Output rules:
Do NOT include analysis step numbers (1-8) in your visible output.

Your deliverable:
Your inline informational comment will be based on the following requirements:

Test Execution Plan

  • Run smoke tests: True / False — If True, state the dependency path (test → fixture → changed symbol). True ONLY with a verified path.
  • Run gating tests: True / False — If True, state the dependency path. True if any gating-marked test is in the affected set.
  • Affected tests to run (required when utilities/, libs/, or shared conftest changes — list concrete paths even when smoke is False)

Use these formats:

  • path/to/test_file.py - When the entire test file needs verification
  • path/to/test_file.py::TestClass::test_method - When specific test(s) needed
  • path/to/test_file.py::test_function - When specific test(s) needed
  • -m marker - When a marker covers multiple affected tests (e.g. -m gating only if ALL gating tests in scope need run)
  • Tag each listed test or group with its marker when not obvious, e.g. (gating) or (smoke)

Real test commands (MANDATORY when changes affect session/runtime code):

When the affected code runs at session/collection time (conftest fixtures, pytest plugins,
config hooks, session-scoped setup) or modifies runtime behavior that unit tests mock away,
you MUST include concrete pytest commands the PR author must run on a real cluster
to verify the change works end-to-end. Include:

  • A command for the error/fix path (the scenario the PR fixes)
  • A command for the happy path (regression: the normal case still works)
  • Use lightweight tests (e.g., --collect-only for startup failures,
    a single small test for runtime behavior)
    If the PR only changes test logic (not utilities/libs/conftest), the affected test
    paths themselves serve as the real test commands — no separate section needed.

Example output for a session-startup fix:

**Real tests (cluster required)**
Error path (the fix):
`pytest tests/storage/.../test_foo.py --storage-class-matrix=nonexistent-sc --collect-only`
Expected: ValueError with clear message, not IndexError

Happy path (regression):
`pytest tests/storage/.../test_foo.py --storage-class-matrix=<valid-sc> -k test_bar`
Expected: session starts normally

Guidelines:

  • Include tests affected directly OR via fixture setup/teardown, yield from cleanup, or transitive utility call chains (caller calls modified helper)
  • Use a full file path only if ALL tests in that file require verification
  • Use file path + test name when only specific tests use an affected fixture or utility wrapper (preferred for partial file impact)
  • If a test marker can cover multiple files/tests, provide the marker
  • Balance coverage vs over-testing - Keep descriptions minimal
  • Example: if leaf helper foo() changes, include tests whose fixture teardown calls wrapper bar() where bar() calls foo(), even when the test body only imports an unrelated symbol from the same utilities module

Hardware-Related Checks (SR-IOV, GPU, DPDK):

When PR modifies fixtures for hardware-specific resources:

  • Collection Safety: Fixtures MUST have existence checks (return None when hardware unavailable)
  • Test Plan: MUST verify both WITH and WITHOUT hardware:
    • Run affected tests on cluster WITH hardware
    • Verify collection succeeds on cluster WITHOUT hardware

CRITICAL WORKFLOW COMPLETION RULES:

When responding to this test execution plan request, you MUST follow these rules EXACTLY:

  1. YOUR ONLY DELIVERABLE: Post one non-blocking inline comment containing the test execution plan on the first changed line
  2. THEN STOP IMMEDIATELY - Do NOT generate any additional response
  3. FALLBACK ONLY: If inline comment API calls fail after retrying, post as a regular PR comment
  4. SILENCE = SUCCESS: After successfully submitting the review, your task is complete. No confirmation needed.

ABSOLUTE PROHIBITIONS (violating these creates empty/meaningless reviews):

  • ❌ Do NOT post acknowledgment messages like "Test execution plan posted", "Review posted successfully", "I've successfully posted"
  • ❌ Do NOT mention review IDs, URLs, or confirmation of posting in the PR thread
  • ❌ Do NOT add any follow-up comments after submitting the review
  • ❌ Do NOT reply to confirm task completion
  • ❌ Do NOT explain what you did - just do it and stop

Remember: The pull request review is visible to users. Additional comments are redundant noise.

CRITICAL — Verification rules for smoke and gating decisions:
- For each True decision, you must have traced a concrete path: test file → fixture → utility function → modified symbol.
- Do NOT assume impact based on scope, file proximity, or infrastructure-sounding names.
Only set True when you trace a concrete path through steps 6a-6e, then 7 or 8.
WRONG: "utilities/hco.py changed → run smoke" (no traced dependency path)
WRONG: "Run smoke tests: False" with no affected-test list when gating tests use the changed utility
RIGHT: "Run smoke tests: False, Run gating tests: True" plus tests/.../test_foo.py::TestBar::test_baz (gating)
RIGHT: "Smoke test X uses fixture Y → wrapper Z → changed function W in utilities/hco.py"

@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

repo="RedHatQE/openshift-virtualization-tests"
pr="5347"

echo "== PR head SHA =="
head_sha=$(gh pr view "$pr" --repo "$repo" --json headRefOid --jq '.headRefOid')
echo "$head_sha"

echo
echo "== Changed files in order =="
mapfile -t pr_files < <(gh pr view "$pr" --repo "$repo" --json files --jq '.files[].path')
printf '%s\n' "${pr_files[@]}"

first_file="${pr_files[0]}"
echo
echo "== First changed file =="
echo "$first_file"

echo
echo "== Patch for first changed file (head) =="
gh api "repos/$repo/pulls/$pr/files?per_page=100" --jq '.[] | select(.filename=="'"$first_file"'") | .patch'

echo
echo "== Current diff for changed files =="
for f in "${pr_files[@]}"; do
  echo "--- $f ---"
  git diff --unified=3 "origin/$(git rev-parse --abbrev-ref HEAD)~1" -- "$f" 2>/dev/null || git diff --unified=3 HEAD~1 -- "$f" || true
done

echo
echo "== Outline for candidate library/shared files =="
for f in libs/vm/vm.py tests/network/libs/stuntime.py; do
  if [ -f "$f" ]; then
    echo "--- $f ---"
    ast-grep outline "$f" --view expanded | sed -n '1,240p'
  fi
done

echo
echo "== Modified function/method definitions in libs/utilities/shared files =="
for f in libs/vm/vm.py tests/network/libs/stuntime.py; do
  if [ -f "$f" ]; then
    echo "--- $f ---"
    git diff --unified=0 HEAD~1 -- "$f" | rg '^[+-]\s*(def |class )' || true
  fi
done

echo
echo "== Direct callers/imports in tests and conftest =="
rg -n -C2 '\bwait_for_vmi_affinity\s*\(|\bassert_affinity_after_migration\s*\(|\bset_template_affinity\s*\(' tests tests/**/conftest.py || true

echo
echo "== Same-file callers in changed library/shared files =="
for f in libs/vm/vm.py tests/network/libs/stuntime.py; do
  if [ -f "$f" ]; then
    echo "--- $f ---"
    rg -n -C2 'wait_for_vmi_affinity|assert_affinity_after_migration|_assert_vmi_affinity|_filter_out_none_values' "$f" || true
  fi
done

echo
echo "== Changed test files: markers and test defs =="
for f in tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py tests/network/localnet/migration_stuntime/test_migration_stuntime.py; do
  if [ -f "$f" ]; then
    echo "--- $f markers ---"
    rg -n '`@pytest`\.mark\.(smoke|gating|tier[0-9]+|polarion|parametrize)' "$f" || true
    echo "--- $f tests ---"
    rg -n '^\s*def test_' "$f" || true
  fi
done

echo
echo "== Smoke-marked files =="
rg -l '`@pytest`\.mark\.smoke' tests || true

echo
echo "== Gating-marked files =="
rg -l '`@pytest`\.mark\.gating' tests || true

echo
echo "== Parent conftest references to modified symbols =="
rg -n -C2 'wait_for_vmi_affinity|assert_affinity_after_migration|set_template_affinity' tests/**/conftest.py tests/conftest.py || true

echo
echo "== First changed line in first changed file =="
first_line=$(
  gh api "repos/$repo/pulls/$pr/files?per_page=100" --jq '.[] | select(.filename=="'"$first_file"'") | .patch' \
  | python - <<'PY'
import sys,re
patch=sys.stdin.read().splitlines()
cur=None
for line in patch:
    m=re.match(r'^@@ -\d+(?:,\d+)? \+(\d+)(?:,\d+)? @@', line)
    if m:
        cur=int(m.group(1))
        continue
    if cur is None:
        continue
    if line.startswith('+') and not line.startswith('+++'):
        print(cur)
        break
    if line.startswith('-') and not line.startswith('---'):
        continue
    cur += 1
PY
)
echo "$first_line"

body="$(cat <<'EOF'
**Test Execution Plan**

- **Run smoke tests: False** — no verified dependency path from any `@pytest.mark.smoke` test or parent `conftest.py` to `libs/vm/vm.py::BaseVirtualMachine.wait_for_vmi_affinity()` or `tests/network/libs/stuntime.py::assert_affinity_after_migration()`.
- **Run gating tests: False** — no verified dependency path from any `@pytest.mark.gating` test to the changed symbols.

- **Affected tests to run**
  - `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_off_server_node` (tier3)
  - `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_to_server_node` (tier3)
  - `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_off_client_node` (tier3)
  - `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_to_client_node` (tier3)
  - `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_off_server_node` (tier3)
  - `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_to_server_node` (tier3)
  - `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_off_client_node` (tier3)
  - `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_to_client_node` (tier3)

These are the concrete affected paths because the changed tests now call `BaseVirtualMachine.wait_for_vmi_affinity()` directly after `set_template_affinity()`, and they conditionally call `tests/network/libs/stuntime.py::assert_affinity_after_migration()` after `migrate_vm_and_verify()`.

**Real tests (cluster required)**

Error path (the fix):
`pytest -s -o log_cli=true -m tier3 --jira tests/network/localnet/migration_stuntime/test_migration_stuntime.py -k 'test_client_migrates_off_server_node or test_server_migrates_off_client_node'`
Expected: after `set_template_affinity()`, the VMI affinity converges before migration, the migrated VM lands with matching anti-affinity rules, and stuntime stays within `STUNTIME_THRESHOLD_SECONDS`.

Happy path (regression):
`pytest -s -o log_cli=true -m tier3 --jira tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py -k 'test_client_migrates_to_server_node or test_server_migrates_to_client_node'`
Expected: affinity-targeted migrations still succeed, the target pod reflects the expected affinity type when `CNV-90576` is open, and stuntime stays within `STUNTIME_THRESHOLD_SECONDS`.
EOF
)"

echo
echo "== Attempt inline review comment posting =="
set +e
inline_ok=0
for attempt in 1 2 3; do
  echo "-- inline attempt $attempt --"
  gh api \
    --method POST \
    "repos/$repo/pulls/$pr/comments" \
    -f body="$body" \
    -f commit_id="$head_sha" \
    -f path="$first_file" \
    -F line="$first_line" \
    -f side="RIGHT"
  rc=$?
  echo "inline_rc=$rc"
  if [ $rc -eq 0 ]; then
    inline_ok=1
    break
  fi
  sleep 2
done

if [ $inline_ok -ne 1 ]; then
  echo
  echo "== Inline failed; attempt fallback regular PR comment =="
  gh api \
    --method POST \
    "repos/$repo/issues/$pr/comments" \
    -f body="$body"
  echo "fallback_rc=$?"
fi
set -e

Length of output: 7215


🏁 Script executed:

#!/bin/bash
set -euo pipefail

repo="RedHatQE/openshift-virtualization-tests"
pr="5347"

tmpdir="$(mktemp -d)"
trap 'rm -rf "$tmpdir"' EXIT

echo "== PR head SHA =="
head_sha="$(gh pr view "$pr" --repo "$repo" --json headRefOid --jq '.headRefOid')"
echo "$head_sha"

echo
echo "== Changed files in order =="
gh pr view "$pr" --repo "$repo" --json files --jq '.files[].path' > "$tmpdir/pr_files.txt"
cat "$tmpdir/pr_files.txt"

first_file="$(sed -n '1p' "$tmpdir/pr_files.txt")"
echo
echo "== First changed file =="
echo "$first_file"

echo
echo "== Patch for first changed file (head) =="
gh api "repos/$repo/pulls/$pr/files?per_page=100" --jq '.[] | select(.filename=="'"$first_file"'") | .patch' > "$tmpdir/first_patch.txt"
cat "$tmpdir/first_patch.txt"

echo
echo "== Current diff for changed files =="
while IFS= read -r f; do
  echo "--- $f ---"
  git diff --unified=3 HEAD~1 -- "$f" || true
done < "$tmpdir/pr_files.txt"

echo
echo "== Outline for candidate library/shared files =="
for f in libs/vm/vm.py tests/network/libs/stuntime.py; do
  if [ -f "$f" ]; then
    echo "--- $f ---"
    ast-grep outline "$f" --view expanded | sed -n '1,240p'
  fi
done

echo
echo "== Modified function/method definitions in libs/utilities/shared files =="
for f in libs/vm/vm.py tests/network/libs/stuntime.py; do
  if [ -f "$f" ]; then
    echo "--- $f ---"
    git diff --unified=0 HEAD~1 -- "$f" | rg '^[+-]\s*(def |class )' || true
  fi
done

echo
echo "== Direct callers/imports in tests and conftest =="
rg -n -C2 '\bwait_for_vmi_affinity\s*\(|\bassert_affinity_after_migration\s*\(|\bset_template_affinity\s*\(' tests tests/**/conftest.py || true

echo
echo "== Same-file callers in changed library/shared files =="
for f in libs/vm/vm.py tests/network/libs/stuntime.py; do
  if [ -f "$f" ]; then
    echo "--- $f ---"
    rg -n -C2 'wait_for_vmi_affinity|assert_affinity_after_migration|_assert_vmi_affinity|_filter_out_none_values' "$f" || true
  fi
done

echo
echo "== Changed test files: markers and test defs =="
for f in tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py tests/network/localnet/migration_stuntime/test_migration_stuntime.py; do
  if [ -f "$f" ]; then
    echo "--- $f markers ---"
    rg -n '`@pytest`\.mark\.(smoke|gating|tier[0-9]+|polarion|parametrize)' "$f" || true
    echo "--- $f tests ---"
    rg -n '^\s*def test_' "$f" || true
  fi
done

echo
echo "== Smoke-marked files =="
rg -l '`@pytest`\.mark\.smoke' tests || true

echo
echo "== Gating-marked files =="
rg -l '`@pytest`\.mark\.gating' tests || true

echo
echo "== Parent conftest references to modified symbols =="
rg -n -C2 'wait_for_vmi_affinity|assert_affinity_after_migration|set_template_affinity' tests/**/conftest.py tests/conftest.py || true

echo
echo "== First changed line in first changed file =="
python - "$tmpdir/first_patch.txt" <<'PY' > "$tmpdir/first_line.txt"
import re, sys
patch = open(sys.argv[1], encoding="utf-8").read().splitlines()
cur = None
for line in patch:
    m = re.match(r'^@@ -\d+(?:,\d+)? \+(\d+)(?:,\d+)? @@', line)
    if m:
        cur = int(m.group(1))
        continue
    if cur is None:
        continue
    if line.startswith('+') and not line.startswith('+++'):
        print(cur)
        break
    if line.startswith('-') and not line.startswith('---'):
        continue
    cur += 1
PY
first_line="$(cat "$tmpdir/first_line.txt")"
echo "$first_line"

cat > "$tmpdir/body.md" <<'EOF'
**Test Execution Plan**

- **Run smoke tests: False** — no verified dependency path from any `@pytest.mark.smoke` test or parent `conftest.py` to `libs/vm/vm.py::BaseVirtualMachine.wait_for_vmi_affinity()` or `tests/network/libs/stuntime.py::assert_affinity_after_migration()`.
- **Run gating tests: False** — no verified dependency path from any `@pytest.mark.gating` test to the changed symbols.

- **Affected tests to run**
  - `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_off_server_node` (tier3)
  - `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_to_server_node` (tier3)
  - `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_off_client_node` (tier3)
  - `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_to_client_node` (tier3)
  - `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_off_server_node` (tier3)
  - `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_to_server_node` (tier3)
  - `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_off_client_node` (tier3)
  - `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_to_client_node` (tier3)

These are the concrete affected paths because the changed tests now call `BaseVirtualMachine.wait_for_vmi_affinity()` directly after `set_template_affinity()`, and they conditionally call `tests/network/libs/stuntime.py::assert_affinity_after_migration()` after `migrate_vm_and_verify()`.

**Real tests (cluster required)**

Error path (the fix):
`pytest -s -o log_cli=true -m tier3 --jira tests/network/localnet/migration_stuntime/test_migration_stuntime.py -k 'test_client_migrates_off_server_node or test_server_migrates_off_client_node'`
Expected: after `set_template_affinity()`, the VMI affinity converges before migration, the migrated VM lands with matching anti-affinity rules, and stuntime stays within `STUNTIME_THRESHOLD_SECONDS`.

Happy path (regression):
`pytest -s -o log_cli=true -m tier3 --jira tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py -k 'test_client_migrates_to_server_node or test_server_migrates_to_client_node'`
Expected: affinity-targeted migrations still succeed, the target pod reflects the expected affinity type when `CNV-90576` is open, and stuntime stays within `STUNTIME_THRESHOLD_SECONDS`.
EOF

body="$(cat "$tmpdir/body.md")"

echo
echo "== Attempt inline review comment posting =="
set +e
inline_ok=0
for attempt in 1 2 3; do
  echo "-- inline attempt $attempt --"
  gh api \
    --method POST \
    "repos/$repo/pulls/$pr/comments" \
    -f body="$body" \
    -f commit_id="$head_sha" \
    -f path="$first_file" \
    -F line="$first_line" \
    -f side="RIGHT"
  rc=$?
  echo "inline_rc=$rc"
  if [ "$rc" -eq 0 ]; then
    inline_ok=1
    break
  fi
  sleep 2
done

if [ "$inline_ok" -ne 1 ]; then
  echo
  echo "== Inline failed; attempt fallback regular PR comment =="
  gh api \
    --method POST \
    "repos/$repo/issues/$pr/comments" \
    -f body="$body"
  echo "fallback_rc=$?"
fi
set -e

Length of output: 46117


@openshift-virtualization-qe-bot-3

CRITICAL: I retried the required workflow on the latest PR head (f5ce3111e1b1c4723faef11205437b536846b0b2) and could not post the comment because the GitHub integration does not have comment-write permission on #5347.

Why this matters: your workflow requires a non-blocking inline PR comment on libs/vm/vm.py, Line 13. Without write access, neither the primary path nor the fallback can complete.

Posting attempts

  • Inline review comment on libs/vm/vm.py, Line 13: failed 3 times
  • Fallback regular PR comment: failed
  • Error each time: HTTP 403: Resource not accessible by integration

Here is the exact comment body to post manually on libs/vm/vm.py, Line 13:

**Test Execution Plan**

- **Run smoke tests: False** — no verified dependency path from any `@pytest.mark.smoke` test or parent `conftest.py` to `libs/vm/vm.py::BaseVirtualMachine.wait_for_vmi_affinity()` or `tests/network/libs/stuntime.py::assert_affinity_after_migration()`.
- **Run gating tests: False** — no verified dependency path from any `@pytest.mark.gating` test to the changed symbols.

- **Affected tests to run**
  - `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_off_server_node` (tier3)
  - `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_to_server_node` (tier3)
  - `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_off_client_node` (tier3)
  - `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_to_client_node` (tier3)
  - `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_off_server_node` (tier3)
  - `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_to_server_node` (tier3)
  - `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_off_client_node` (tier3)
  - `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_to_client_node` (tier3)

These are the concrete affected paths because the changed tests now call `BaseVirtualMachine.wait_for_vmi_affinity()` directly after `set_template_affinity()`, and they conditionally call `tests/network/libs/stuntime.py::assert_affinity_after_migration()` after `migrate_vm_and_verify()`.

**Real tests (cluster required)**

Error path (the fix):
`pytest -s -o log_cli=true -m tier3 --jira tests/network/localnet/migration_stuntime/test_migration_stuntime.py -k 'test_client_migrates_off_server_node or test_server_migrates_off_client_node'`
Expected: after `set_template_affinity()`, the VMI affinity converges before migration, the migrated VM lands with matching anti-affinity rules, and stuntime stays within `STUNTIME_THRESHOLD_SECONDS`.

Happy path (regression):
`pytest -s -o log_cli=true -m tier3 --jira tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py -k 'test_client_migrates_to_server_node or test_server_migrates_to_client_node'`
Expected: affinity-targeted migrations still succeed, the target pod reflects the expected affinity type when `CNV-90576` is open, and stuntime stays within `STUNTIME_THRESHOLD_SECONDS`.

@azhivovk azhivovk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully it's okay to apply W/A for the stuntime measruements

Comment thread libs/vm/vm.py
self._assert_vmi_affinity(expected_affinity=expected_affinity)

@retry(wait_timeout=10, sleep=1)
def _assert_vmi_affinity(self, expected_affinity: dict[str, Any] | None) -> bool:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT:
Since we use assert for tests, consider renaming to expected_vmi_affinity

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 from me, let not mix up concerns. assert is about test itself, if you have to check some state in precondition/step, please use any other way/naming, not assert.

Comment thread libs/vm/vm.py
@retry(wait_timeout=10, sleep=1)
def _assert_vmi_affinity(self, expected_affinity: dict[str, Any] | None) -> bool:
vmi_affinity = self.vmi.instance.to_dict()["spec"].get("affinity")
assert vmi_affinity == expected_affinity, (

@servolkov servolkov Jun 30, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see thread. assert is a test concern, not library. Raise a clear exception instead.

return stuntime


def assert_affinity_after_migration(vm: BaseVirtualMachine, expected_type: str) -> None:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is not a test itself, rename the helper and don't use assert. I bet we don't test affinity after migration, hence it does not deserve assertion. Otherwise, it is a mess.

localnet_stuntime_client_vm.set_template_affinity(affinity=new_pod_anti_affinity(label=SERVER_VM_LABEL))
localnet_stuntime_client_vm.wait_for_vmi_affinity()
migrate_vm_and_verify(vm=localnet_stuntime_client_vm, client=admin_client)
if is_jira_open(jira_id="CNV-90576"):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw don't we need to check affinity even without this bug? I mean, perhaps the direction is right regardless the bug -> checking jira is not needed?

Comment thread libs/vm/vm.py
patches = {self: {"spec": {"template": {"spec": {"affinity": template_affinity}}}}}
ResourceEditor(patches=patches).update()

def wait_for_vmi_affinity(self) -> None:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per my understanding, the API contract of "set affinity" should mean "affinity is applied when I return." Hence the wait operation should be a part if set_template_affinity. Otherwise, every caller must remember the wait, and future callers may forget. Consider integrating the reconciliation wait into set_template_affinity itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants