OADP-7541: fix non-deterministic matchExpressions ordering causing node-agent restarts#2234
Conversation
…de-agent restarts When a DPA configures nodeAgent.podConfig.nodeSelector with multiple labels, kube.ToSystemAffinity() iterates Go maps in random order to build matchExpressions. Kubernetes treats arrays as ordered, so each order flip registers as a spec change, triggering DaemonSet rollouts every reconciliation (~30s). Customer observed generation 13858 on node-agent while velero deployment stayed at 2. Sort matchExpressions by key after building them from LoadAffinityConfig to ensure deterministic ordering. Applied to both node-agent DaemonSet and Velero Deployment code paths. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
|
@kaovilai: This pull request references OADP-7541 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughSort NodeSelectorTerm MatchExpressions/MatchFields by requirement key via new common.SortNodeSelectorTerms; apply the sorter in NodeAgent and Velero affinity assembly and add tests asserting deterministic ordering. ChangesDeterministic node affinity ordering
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 14 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (14 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
/cherry-pick oadp-1.6 |
|
@kaovilai: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/cherry-pick oadp-1.5 |
|
@kaovilai: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Sort expected matchExpressions alphabetically by key to match the deterministic ordering introduced in the previous commit. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
|
/retest-required Note Responses generated with Claude
|
Address review feedback: sort both MatchExpressions and MatchFields by key for completeness and future-proofing. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering> Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
|
/retest |
|
@kaovilai: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@PrasadJoshi12 do you have an automated test case for this? |
E2E Verification: OADP-7541 fix confirmedEnvironment
Test: Multi-label nodeSelector stabilityCreated DPA with two nodeSelector labels (reproducing the customer's multi-label config from OADP-7541): spec:
configuration:
nodeAgent:
enable: true
uploaderType: kopia
podConfig:
nodeSelector:
kubernetes.io/os: linux
kubernetes.io/arch: arm64Result: PASSmatchExpressions correctly sorted by key: [
{"key": "kubernetes.io/arch", "operator": "In", "values": ["arm64"]},
{"key": "kubernetes.io/os", "operator": "In", "values": ["linux"]}
]Generation monitoring over 5 minutes (10 checks at 30s intervals):
Note Responses generated with Claude |
No, we need to add one |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kaovilai, sseago The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Summary
nodeAgent.podConfig.nodeSelectorwith multiple labels,kube.ToSystemAffinity()iterates Go maps in random order to buildmatchExpressions. Kubernetes treats arrays as ordered, so each order flip registers as a spec change, triggering DaemonSet rollouts every reconciliation (~30s). Customer observed generation 13858 on node-agent while velero deployment stayed at 2.common.SortNodeSelectorTerms()helper that sortsMatchExpressionsby key within eachNodeSelectorTermTest plan
TestNodeAgentAffinityMatchExpressionsDeterministicOrder(3 sub-cases × 100 iterations)TestVeleroAffinityMatchExpressionsDeterministicOrder(100 iterations)-count=5for additional confidence🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Tests