[fix](fe) Prevent pushing other join conditions to right child of null-aware anti join when hash conjuncts are empty#64898
Draft
starocean999 wants to merge 1 commit into
Draft
Conversation
…l-aware anti join when hash conjuncts are empty
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
/review |
Contributor
There was a problem hiding this comment.
Review summary for PR #64898:
I reviewed the three changed files and the surrounding Nereids null-aware anti join rewrite/translation path. I did not find a substantiated blocking issue, so I am leaving this as a summary-only review with no inline comments.
Critical checkpoint conclusions:
- Goal and proof: the PR prevents
PushDownJoinOtherConditionfrom pushing a right-child-only predicate below aNULL_AWARE_LEFT_ANTI_JOINwhenhashJoinConjunctsis empty. The addedqt_select65covers the constant-vs-null subquery case and expectsCOUNT(*) = 0. - Scope: the code change is small and focused on the affected pushdown rule, with a matching regression output update.
- Parallel paths: checked
InApplyToJoin,FindHashConditionForJoin,JoinUtils.JoinSlotCoverageChecker, repeatedRuleSet.PUSH_DOWN_FILTERSuse, and physical hash/nested-loop translation. I did not find another path that would still push the same no-hash NAAJ right-only comparison into the build child. - Session/config behavior: no new configuration item is added. The regression suite does not set planner variables, but
SessionVariabledefaults keep Nereids enabled and fallback disabled, so the new case should exercise the Nereids rule path under default regression settings. - Concurrency, lifecycle, persistence, data writes, FE-BE protocol compatibility, and observability: not materially involved by this PR.
- Test/output review: the new query returns a single aggregate row, so ordering is deterministic, and the
.outlabel/result matches the suite case. - Validation limit: I did not run FE build or the regression suite locally because this runner lacks
thirdparty/installedandthirdparty/installed/bin/protoc. I did rungit diff --checkon the three PR files, which passed.
User focus: no additional user-provided review focus was present.
Subagent conclusions:
optimizer-rewritereportedNO_NEW_VALUABLE_FINDINGS; no optimizer candidate became an inline comment.tests-session-configreportedNO_NEW_VALUABLE_FINDINGS; no test/session candidate became an inline comment.- Convergence round 1 ended with both live subagents reporting
NO_NEW_VALUABLE_FINDINGSfor the same current ledger and empty proposed final inline comment set.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When
PushDownJoinOtherConditionprocesses aNULL_AWARE_LEFT_ANTI_JOINwith empty hash join conjuncts, it incorrectly pushes the other join conditions
that reference only the right-side slots down to the right child. This changes
the evaluation semantics of the NOT IN subquery, because filtering the right
child before the null-aware anti join eliminates NULL values prematurely.
For a
NULL_AWARE_LEFT_ANTI_JOIN, the hash join conjuncts represent thecorrelation between the outer and inner tables. When there is no correlation
(i.e., hash conjuncts are empty), the other join conditions must not be pushed
to the right child, as doing so would filter rows from the subquery result
before the NULL check, potentially producing incorrect results.
Example affected query:
SELECT COUNT(*) FROM (SELECT 1 AS x) t
WHERE 1 NOT IN (SELECT CAST(NULL AS INT));
The subquery returns NULL, so NOT IN should evaluate to UNKNOWN (NULL),
meaning no rows match and COUNT returns 0. However, pushing the other condition
to the right child changes this behavior and produces wrong results.
Fix:
Add a guard in
PushDownJoinOtherConditionto keep the other condition asa remaining other conjunct (not pushed to the right child) when the join type
is
NULL_AWARE_LEFT_ANTI_JOINand hash join conjuncts are empty.None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)