Skip to content

[fix](set operation) Use regular child output for set operation rules#64908

Open
morrySnow wants to merge 1 commit into
apache:masterfrom
morrySnow:fix-union
Open

[fix](set operation) Use regular child output for set operation rules#64908
morrySnow wants to merge 1 commit into
apache:masterfrom
morrySnow:fix-union

Conversation

@morrySnow

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Related PR: #24060, #27628

Problem Summary: Some Nereids rewrite rules mapped set operation outputs to child outputs by list index. Set operations can carry regulator child outputs whose order and cardinality differ from the child plan output, so those rules could read past the child output or rewrite expressions with the wrong child slot. This change makes one-row union merge and distinct TopN/Limit pushdown build child mappings from regular child outputs instead.

Release note

Fix planner failures or incorrect set operation rewrites when regulated child outputs differ from child plan outputs.

Check List (For Author)

  • Test: Regression test / Unit Test
    • ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.SetOperationOutputMappingTest
    • cd fe && mvn checkstyle:check -pl fe-core
    • ./run-regression-test.sh --run -d nereids_rules_p0/push_down_top_n -s push_down_top_n_distinct_through_union
    • ./run-regression-test.sh --run -d nereids_rules_p0/push_down_limit_distinct -s push_down_limit_distinct_through_union
    • ./run-regression-test.sh --run -d nereids_rules_p0/merge_one_row_relation -s merge_one_row_relation_into_union
  • Behavior changed: Yes (fixes set operation rewrites to use regular child outputs)
  • Does this need documentation: No

### What problem does this PR solve?

Issue Number: None

Related PR: apache#64900, apache#64906

Problem Summary: Some Nereids rewrite rules mapped set operation outputs to child outputs by list index. Set operations can carry regulator child outputs whose order and cardinality differ from the child plan output, so those rules could read past the child output or rewrite expressions with the wrong child slot. This change makes one-row union merge and distinct TopN/Limit pushdown build child mappings from regular child outputs instead.

### Release note

Fix planner failures or incorrect set operation rewrites when regulated child outputs differ from child plan outputs.

### Check List (For Author)

- Test: Regression test / Unit Test
    - ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.SetOperationOutputMappingTest
    - cd fe && mvn checkstyle:check -pl fe-core
    - ./run-regression-test.sh --run -d nereids_rules_p0/push_down_top_n -s push_down_top_n_distinct_through_union
    - ./run-regression-test.sh --run -d nereids_rules_p0/push_down_limit_distinct -s push_down_limit_distinct_through_union
    - ./run-regression-test.sh --run -d nereids_rules_p0/merge_one_row_relation -s merge_one_row_relation_into_union
- Behavior changed: Yes (fixes set operation rewrites to use regular child outputs)
- Does this need documentation: No
@morrySnow

Copy link
Copy Markdown
Contributor Author

/review

@morrySnow

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review found one blocking issue: the new regression EXPLAIN cases use plain sql, so they execute but do not compare the shape plan golden output. Inline comments mark each instance.

Checkpoint conclusions:

  • Goal/scope: The code change is focused on using regularChildrenOutputs in set-operation rewrites, and the implementation path appears semantically consistent after tracing BindExpression.bindSetOperation, LogicalUnion, expression replacement, and physical union translation.
  • Tests: FE unit tests target the internal mapping, but the added regression coverage is incomplete because the new planner-session EXPLAIN statements are not qt_ checks and add no .out results.
  • Concurrency/lifecycle/config/compatibility/persistence/storage/cloud: not involved by this FE optimizer/test-only PR.
  • Parallel paths: one-row merge plus limit/topN distinct union paths were reviewed; no additional rewrite issue was found.
  • Validation: git diff --check passed. I did not run FE/regression tests locally because thirdparty/installed/bin/protoc is absent in this checkout.

Subagent conclusions:

  • optimizer-rewrite: no optimizer semantic or mapping candidates; final convergence round returned NO_NEW_VALUABLE_FINDINGS.
  • tests-session-config: TSC-1 accepted as the inline regression-test issue; final convergence round returned NO_NEW_VALUABLE_FINDINGS.

User focus: no additional focus was provided.

explain shape plan select * from ((select * from table2 t1 limit 5) union (select * from table2 t2 limit 5)) sub order by id limit 10;
"""

sql """

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this new EXPLAIN a named qt_... check and regenerate the corresponding .out block instead of running it with plain sql. sql only executes the EXPLAIN; it does not compare the shape plan, so this regression would still pass if the rewrite stopped pushing the intended shape as long as planning succeeds.

);
"""

sql """

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this EXPLAIN a named qt_... check and add the generated .out output instead of running it with plain sql. sql only executes the EXPLAIN; it does not compare the shape plan, so this regression would still pass if the rewrite stopped pushing the intended shape as long as planning succeeds.

sql "SET enable_nereids_planner=true"
sql "SET enable_fallback_to_original_planner=false"

sql """

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this EXPLAIN a named qt_... check and add the generated .out output instead of running it with plain sql. sql only executes the EXPLAIN; it does not compare the shape plan, so this regression would still pass if the one-row merge stopped producing the intended shape as long as planning succeeds.

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29252 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2f061fcc621973c7eaf30dbfb9d2219862ccaa30, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17655	4070	4042	4042
q2	2037	323	190	190
q3	10293	1433	825	825
q4	4678	475	348	348
q5	7485	905	582	582
q6	188	176	139	139
q7	762	862	615	615
q8	9385	1530	1553	1530
q9	5559	4532	4470	4470
q10	6783	1778	1520	1520
q11	453	279	264	264
q12	620	417	301	301
q13	18101	3388	2758	2758
q14	277	273	234	234
q15	q16	785	771	721	721
q17	980	986	939	939
q18	6922	5919	5649	5649
q19	1308	1247	1121	1121
q20	508	402	269	269
q21	5957	2757	2425	2425
q22	440	363	310	310
Total cold run time: 101176 ms
Total hot run time: 29252 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4388	4283	4261	4261
q2	327	370	234	234
q3	4608	4951	4383	4383
q4	2067	2147	1369	1369
q5	4458	4279	4265	4265
q6	231	180	132	132
q7	1739	1617	1918	1617
q8	2649	2232	2150	2150
q9	8278	8402	8096	8096
q10	4753	4721	4317	4317
q11	588	417	422	417
q12	740	769	530	530
q13	3254	3485	2956	2956
q14	301	326	282	282
q15	q16	716	770	658	658
q17	1348	1354	1457	1354
q18	7656	7262	7232	7232
q19	1177	1178	1135	1135
q20	2207	2203	1951	1951
q21	5258	4585	4442	4442
q22	507	444	397	397
Total cold run time: 57250 ms
Total hot run time: 52178 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 171538 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2f061fcc621973c7eaf30dbfb9d2219862ccaa30, data reload: false

query5	4342	655	484	484
query6	460	198	182	182
query7	4818	561	311	311
query8	325	184	171	171
query9	8778	4035	4010	4010
query10	441	315	270	270
query11	5934	2389	2148	2148
query12	162	104	101	101
query13	1274	616	435	435
query14	6288	5317	4936	4936
query14_1	4268	4234	4306	4234
query15	225	200	180	180
query16	1031	485	467	467
query17	1134	719	582	582
query18	2726	477	380	380
query19	210	184	150	150
query20	118	108	108	108
query21	217	143	120	120
query22	13610	13679	13413	13413
query23	17418	16472	16207	16207
query23_1	16244	16267	16254	16254
query24	7512	1775	1292	1292
query24_1	1341	1312	1319	1312
query25	567	450	403	403
query26	1348	313	179	179
query27	2558	545	342	342
query28	4353	2052	2014	2014
query29	1088	630	503	503
query30	310	240	200	200
query31	1121	1083	965	965
query32	105	64	62	62
query33	549	327	265	265
query34	1180	1176	661	661
query35	755	787	690	690
query36	1409	1398	1216	1216
query37	153	108	92	92
query38	1878	1716	1658	1658
query39	925	970	898	898
query39_1	866	868	867	867
query40	214	125	99	99
query41	63	61	63	61
query42	91	86	92	86
query43	315	314	281	281
query44	1415	808	784	784
query45	202	185	178	178
query46	1047	1196	747	747
query47	2408	2348	2228	2228
query48	401	433	295	295
query49	580	423	315	315
query50	976	372	273	273
query51	4492	4408	4321	4321
query52	83	80	70	70
query53	249	264	190	190
query54	266	224	199	199
query55	72	69	68	68
query56	236	219	228	219
query57	1419	1383	1289	1289
query58	235	217	209	209
query59	1576	1611	1417	1417
query60	275	236	222	222
query61	156	151	151	151
query62	720	637	593	593
query63	225	186	198	186
query64	2472	791	602	602
query65	4882	4782	4758	4758
query66	1753	463	332	332
query67	28799	28843	28729	28729
query68	3306	1551	913	913
query69	413	305	269	269
query70	1019	980	930	930
query71	288	232	207	207
query72	2952	2805	2328	2328
query73	860	777	448	448
query74	5103	4942	4756	4756
query75	2558	2508	2149	2149
query76	2319	1178	807	807
query77	348	386	269	269
query78	12290	12504	11910	11910
query79	1380	1132	741	741
query80	1222	472	372	372
query81	496	272	236	236
query82	569	158	120	120
query83	325	276	245	245
query84	278	146	116	116
query85	919	518	418	418
query86	399	289	290	289
query87	1838	1831	1766	1766
query88	3681	2759	2768	2759
query89	423	387	337	337
query90	1904	178	172	172
query91	170	166	136	136
query92	66	64	54	54
query93	1544	1384	891	891
query94	634	326	323	323
query95	692	389	348	348
query96	1097	783	345	345
query97	2725	2704	2563	2563
query98	208	200	198	198
query99	1173	1158	1039	1039
Total cold run time: 257282 ms
Total hot run time: 171538 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 25.17 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 2f061fcc621973c7eaf30dbfb9d2219862ccaa30, data reload: false

query1	0.00	0.00	0.01
query2	0.10	0.05	0.04
query3	0.25	0.14	0.15
query4	1.61	0.14	0.14
query5	0.25	0.23	0.23
query6	1.30	1.10	1.11
query7	0.04	0.01	0.01
query8	0.06	0.04	0.04
query9	0.38	0.31	0.32
query10	0.60	0.58	0.53
query11	0.19	0.15	0.13
query12	0.19	0.15	0.14
query13	0.48	0.48	0.46
query14	1.00	1.02	1.00
query15	0.60	0.59	0.58
query16	0.33	0.32	0.31
query17	1.09	1.05	1.11
query18	0.22	0.21	0.21
query19	2.05	2.01	2.00
query20	0.01	0.02	0.01
query21	15.44	0.22	0.14
query22	4.81	0.06	0.05
query23	16.14	0.31	0.13
query24	2.96	0.44	0.31
query25	0.11	0.06	0.04
query26	0.72	0.22	0.15
query27	0.05	0.03	0.03
query28	3.54	0.91	0.54
query29	12.51	4.28	3.44
query30	0.28	0.15	0.16
query31	2.76	0.58	0.30
query32	3.22	0.58	0.48
query33	3.25	3.21	3.20
query34	15.67	4.16	3.48
query35	3.52	3.51	3.55
query36	0.54	0.43	0.43
query37	0.08	0.07	0.06
query38	0.05	0.04	0.03
query39	0.03	0.03	0.03
query40	0.18	0.17	0.16
query41	0.08	0.03	0.03
query42	0.04	0.02	0.02
query43	0.04	0.03	0.04
Total cold run time: 96.77 s
Total hot run time: 25.17 s

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 100.00% (13/13) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (13/13) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants