Skip to content

[feat](tools) add tpch primary key/foreign key in tpch-tools#64863

Open
englefly wants to merge 2 commits into
apache:masterfrom
englefly:tpch-pk-fk
Open

[feat](tools) add tpch primary key/foreign key in tpch-tools#64863
englefly wants to merge 2 commits into
apache:masterfrom
englefly:tpch-pk-fk

Conversation

@englefly

@englefly englefly commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

  1. add tpch primary key/foreign key in tpch-tools
  2. add tpch_sf0.1_constraints test suites

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@englefly

Copy link
Copy Markdown
Contributor Author

run buildall

@englefly

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one correctness issue in the new TPCH constraint cleanup scripts.

Critical checkpoints:

  • Goal/test proof: The PR adds manual TPCH PK/FK/UK helper SQL files, but the PK/UK drop script does not use Doris' supported constraint-drop syntax, so that cleanup path is not proven to work.
  • Scope/focus: The change is small and focused; the issue is localized to the PK/UK cleanup file.
  • Concurrency/lifecycle/config/compatibility/persistence/data writes/observability: Not applicable for these standalone SQL helper scripts.
  • Parallel paths: The FK cleanup script uses drop constraint; the PK/UK cleanup script should follow the same path.
  • Test coverage/results: I did not run an FE parser/regression test in this runner; I verified the syntax against the Nereids grammar, command builder, existing constraint tests, and git diff --check.
  • User focus: No additional user-provided focus was supplied.

Subagent conclusions:

  • optimizer-rewrite proposed OPT-1, accepted as inline comment FINAL-1.
  • tests-session-config reported TEST-1 as a duplicate of OPT-1 and found no additional issue.
  • Convergence round 1 ended with both live subagents replying NO_NEW_VALUABLE_FINDINGS for the same ledger/comment set.

Comment thread tools/tpch-tools/constraints/drop-pk-uk-constraints.sql Outdated
@englefly

Copy link
Copy Markdown
Contributor Author

run buildall

@englefly

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found two issues in the new constrained TPC-H regression suite.

Critical checkpoints:

  • Goal/test proof: The PR adds TPCH PK/FK/UK helper scripts and a tpch_sf0.1_constraints_p1 regression suite. The constraint declarations themselves match the TPC-H keys and FKs, but the suite currently misses the source for one checked-in expected output and its load setup is not rerunnable.
  • Scope/focus: GitHub's PR files API reports 76 changed files, matching .code-review.9240ex/pr_changed_files.txt. I used that as authoritative because the local checkout is shallow/grafted and a direct local base..head diff showed unrelated FE/CDC history.
  • Concurrency/lifecycle/config/compatibility/persistence/data writes/observability: No production concurrency, config, compatibility, persistence, or data-write path changes in the authoritative PR scope. The relevant lifecycle concern is regression DB reuse across repeated suite runs.
  • Parallel paths: The helper drop scripts now use drop constraint; the load suite does not use an equivalent cleanup path before re-adding constraints.
  • Test coverage/results: The new constrained suite copies all q01-q22 SQL and expected outputs, but cse.out has no matching constrained suite source. I did not run the full regression suite in this runner.
  • User focus: No additional user-provided review focus was supplied.

Subagent conclusions:

  • optimizer-rewrite initially duplicated two out-of-scope FE candidates from the misleading local diff; after the GitHub-scope correction, they were dismissed and the subagent reported NO_NEW_VALUABLE_FINDINGS for the final two-comment set.
  • tests-session-config found TSC-1, accepted as FINAL-1, and duplicated the same out-of-scope FE/CDC false positives; after correction, it reported NO_NEW_VALUABLE_FINDINGS for the final two-comment set.
  • Convergence round 1 ended with both live subagents replying NO_NEW_VALUABLE_FINDINGS for the same current ledger/comment set.

@@ -0,0 +1,65 @@
-- This file is automatically generated. You should know what you did if you want to edit this
-- !cse --

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This expected output has no matching constrained-suite source file, so the CSE case will not run. The existing suite has regression-test/suites/tpch_sf0.1_p1/sql/cse.groovy, but regression-test/suites/tpch_sf0.1_constraints_p1/sql/ only contains q01.sql through q22.sql; adding data/tpch_sf0.1_constraints_p1/sql/cse.out alone leaves the output stale and unconsumed. Please add the corresponding constrained cse.groovy source, or remove this .out file if CSE is intentionally not part of the new suite.


// Primary Key Constraints
sql """ alter table region add constraint r_pk primary key (r_regionkey) """
sql """ alter table nation add constraint n_pk primary key (n_nationkey) """

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the new load suite fail on rerun in the same regression database. The suite creates tables with IF NOT EXISTS, deletes/reloads the rows, and then unconditionally re-adds r_pk, n_r_fk, r_uk, etc. Those constraints remain in metadata after the first run, and Doris rejects the second ADD CONSTRAINT with Constraint name ... has existed (ConstraintManager.checkConstraintNotExistence). Please make the setup idempotent, for example by dropping existing FKs first and then PK/UK constraints before re-adding them, or by recreating the tables for this constrained suite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants