Skip to content

[fix](fe) Prevent pushing other join conditions to right child of null-aware anti join when hash conjuncts are empty#64898

Draft
starocean999 wants to merge 1 commit into
apache:masterfrom
starocean999:master_0626
Draft

[fix](fe) Prevent pushing other join conditions to right child of null-aware anti join when hash conjuncts are empty#64898
starocean999 wants to merge 1 commit into
apache:masterfrom
starocean999:master_0626

Conversation

@starocean999

Copy link
Copy Markdown
Contributor

When PushDownJoinOtherCondition processes a NULL_AWARE_LEFT_ANTI_JOIN
with empty hash join conjuncts, it incorrectly pushes the other join conditions
that reference only the right-side slots down to the right child. This changes
the evaluation semantics of the NOT IN subquery, because filtering the right
child before the null-aware anti join eliminates NULL values prematurely.

For a NULL_AWARE_LEFT_ANTI_JOIN, the hash join conjuncts represent the
correlation between the outer and inner tables. When there is no correlation
(i.e., hash conjuncts are empty), the other join conditions must not be pushed
to the right child, as doing so would filter rows from the subquery result
before the NULL check, potentially producing incorrect results.

Example affected query:
SELECT COUNT(*) FROM (SELECT 1 AS x) t
WHERE 1 NOT IN (SELECT CAST(NULL AS INT));

The subquery returns NULL, so NOT IN should evaluate to UNKNOWN (NULL),
meaning no rows match and COUNT returns 0. However, pushing the other condition
to the right child changes this behavior and produces wrong results.

Fix:
Add a guard in PushDownJoinOtherCondition to keep the other condition as
a remaining other conjunct (not pushed to the right child) when the join type
is NULL_AWARE_LEFT_ANTI_JOIN and hash join conjuncts are empty.

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…l-aware anti join when hash conjuncts are empty
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@starocean999

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary for PR #64898:

I reviewed the three changed files and the surrounding Nereids null-aware anti join rewrite/translation path. I did not find a substantiated blocking issue, so I am leaving this as a summary-only review with no inline comments.

Critical checkpoint conclusions:

  • Goal and proof: the PR prevents PushDownJoinOtherCondition from pushing a right-child-only predicate below a NULL_AWARE_LEFT_ANTI_JOIN when hashJoinConjuncts is empty. The added qt_select65 covers the constant-vs-null subquery case and expects COUNT(*) = 0.
  • Scope: the code change is small and focused on the affected pushdown rule, with a matching regression output update.
  • Parallel paths: checked InApplyToJoin, FindHashConditionForJoin, JoinUtils.JoinSlotCoverageChecker, repeated RuleSet.PUSH_DOWN_FILTERS use, and physical hash/nested-loop translation. I did not find another path that would still push the same no-hash NAAJ right-only comparison into the build child.
  • Session/config behavior: no new configuration item is added. The regression suite does not set planner variables, but SessionVariable defaults keep Nereids enabled and fallback disabled, so the new case should exercise the Nereids rule path under default regression settings.
  • Concurrency, lifecycle, persistence, data writes, FE-BE protocol compatibility, and observability: not materially involved by this PR.
  • Test/output review: the new query returns a single aggregate row, so ordering is deterministic, and the .out label/result matches the suite case.
  • Validation limit: I did not run FE build or the regression suite locally because this runner lacks thirdparty/installed and thirdparty/installed/bin/protoc. I did run git diff --check on the three PR files, which passed.

User focus: no additional user-provided review focus was present.

Subagent conclusions:

  • optimizer-rewrite reported NO_NEW_VALUABLE_FINDINGS; no optimizer candidate became an inline comment.
  • tests-session-config reported NO_NEW_VALUABLE_FINDINGS; no test/session candidate became an inline comment.
  • Convergence round 1 ended with both live subagents reporting NO_NEW_VALUABLE_FINDINGS for the same current ledger and empty proposed final inline comment set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants