Skip to content

[refactor](be) Use ColumnArrayView in array functions#64873

Open
Mryange wants to merge 1 commit into
apache:masterfrom
Mryange:array-functions-column-array-view
Open

[refactor](be) Use ColumnArrayView in array functions#64873
Mryange wants to merge 1 commit into
apache:masterfrom
Mryange:array-functions-column-array-view

Conversation

@Mryange

@Mryange Mryange commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Problem Summary: Refactor array_contains, array_position, countequal, array_remove, and arrays_overlap to read array arguments through ColumnArrayView and ColumnView instead of expanding const columns or manually unpacking nullable array internals. This keeps const and nullable handling in the shared view abstraction and reduces duplicated array access code. Add BE unit coverage for const array and const target argument cases.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: Refactor array_contains, array_position, countequal, array_remove, and arrays_overlap to read array arguments through ColumnArrayView and ColumnView instead of expanding const columns or manually unpacking nullable array internals. This keeps const and nullable handling in the shared view abstraction and reduces duplicated array access code. Add BE unit coverage for const array and const target argument cases.

### Release note

None

### Check List (For Author)

- Test: Not run (per request); only ran clang-format-16 and git diff --check.

- Behavior changed: No

- Does this need documentation: No
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Mryange

Mryange commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review completed for PR #64873. I found no issues that warrant an inline review comment.

Critical checkpoint conclusions:

  • Goal: refactor array functions to use ColumnArrayView/ColumnView while covering const arguments. The changed BE paths accomplish that for array_contains, array_position, countequal, array_remove, and arrays_overlap.
  • Scope/focus: the PR scope was verified through GitHub PR files/commit/compare APIs as the seven BE array source/test files listed in .code-review.GF2Snu/pr_changed_files.txt. Local raw diff showed grafted/shallow checkout noise and was not used as PR scope.
  • Correctness: const arrays, const scalar arguments, outer nullable arrays, and nullable array elements keep the same semantics as the previous implementations. Null scalar/null element matching and nullable result wrapping were checked against the old code and DataTypeArray storage invariants.
  • Parallel paths: FE scalar signatures coerce scalar/array element types before BE execution, and the array-family functions were reviewed together for equivalent dispatch and null handling.
  • Concurrency/lifecycle/config/persistence: no new shared mutable state, lifecycle management, configuration item, serialization format, transaction path, or persistence behavior is introduced.
  • Tests: the new BE unit test file is picked up by the recursive BE test CMake glob, and added tests cover const array/scalar cases with null elements. I could not run BE unit tests in this runner because thirdparty/installed and a built doris_be_test binary are absent.
  • Hygiene: git diff --check on the PR files passed, and PATH=/usr/lib/llvm-16/bin:$PATH build-support/check-format.sh passed.

User focus: no additional user-provided review focus was present.

Subagent conclusions: optimizer-rewrite reported no candidate findings; tests-session-config reported no candidate findings. The final counted convergence round was round 2, after the main ledger included the final scope dismissal and an empty proposed comment set; both live subagents replied NO_NEW_VALUABLE_FINDINGS for that same ledger/comment set.

@Mryange

Mryange commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29231 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3b1db376bc2e21bc7799c6b1011cd56aa8dd60cf, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17637	4031	3981	3981
q2	2053	311	187	187
q3	10305	1439	825	825
q4	4683	467	342	342
q5	7526	869	576	576
q6	185	169	143	143
q7	773	827	617	617
q8	9402	1646	1648	1646
q9	5573	4525	4544	4525
q10	6709	1766	1530	1530
q11	445	278	248	248
q12	625	424	286	286
q13	18088	3369	2746	2746
q14	273	262	238	238
q15	q16	789	784	711	711
q17	912	1016	1007	1007
q18	7201	5844	5534	5534
q19	1286	1257	1057	1057
q20	487	423	267	267
q21	5812	2659	2464	2464
q22	439	359	301	301
Total cold run time: 101203 ms
Total hot run time: 29231 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4315	4249	4240	4240
q2	308	363	219	219
q3	4672	4945	4388	4388
q4	2078	2170	1398	1398
q5	4439	4300	4343	4300
q6	237	180	129	129
q7	1746	1631	1810	1631
q8	2728	2224	2153	2153
q9	8233	8430	8106	8106
q10	4837	4787	4300	4300
q11	571	408	388	388
q12	739	792	554	554
q13	3310	3671	2970	2970
q14	307	297	269	269
q15	q16	724	730	637	637
q17	1353	1342	1323	1323
q18	8009	7435	7325	7325
q19	1188	1151	1101	1101
q20	2251	2221	1972	1972
q21	5277	4581	4383	4383
q22	504	442	404	404
Total cold run time: 57826 ms
Total hot run time: 52190 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 171703 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3b1db376bc2e21bc7799c6b1011cd56aa8dd60cf, data reload: false

query5	4353	649	482	482
query6	423	188	168	168
query7	4930	544	294	294
query8	328	183	168	168
query9	8768	4004	4016	4004
query10	446	319	272	272
query11	5701	2348	2143	2143
query12	154	102	101	101
query13	1261	610	405	405
query14	6171	5312	4959	4959
query14_1	4307	4291	4332	4291
query15	219	202	180	180
query16	1026	430	411	411
query17	949	710	596	596
query18	2439	476	353	353
query19	210	190	152	152
query20	108	111	107	107
query21	218	142	119	119
query22	13716	13638	13360	13360
query23	17504	16666	16240	16240
query23_1	16302	16413	16296	16296
query24	7629	1774	1309	1309
query24_1	1339	1305	1296	1296
query25	575	465	445	445
query26	1297	317	171	171
query27	2703	577	341	341
query28	4469	1995	2003	1995
query29	1097	613	480	480
query30	311	228	195	195
query31	1099	1077	947	947
query32	103	59	57	57
query33	523	317	250	250
query34	1174	1125	659	659
query35	775	781	662	662
query36	1413	1406	1249	1249
query37	154	109	93	93
query38	1910	1726	1634	1634
query39	952	929	905	905
query39_1	875	885	881	881
query40	226	123	101	101
query41	71	62	62	62
query42	86	83	87	83
query43	323	325	278	278
query44	1432	796	771	771
query45	200	191	177	177
query46	1042	1216	725	725
query47	2379	2414	2232	2232
query48	364	406	281	281
query49	576	429	320	320
query50	1044	366	263	263
query51	4431	4360	4399	4360
query52	79	80	67	67
query53	242	264	188	188
query54	258	214	208	208
query55	72	70	67	67
query56	234	226	232	226
query57	1439	1406	1329	1329
query58	238	207	208	207
query59	1548	1610	1416	1416
query60	272	245	233	233
query61	150	149	178	149
query62	688	651	584	584
query63	232	186	201	186
query64	2524	758	584	584
query65	4905	4778	4783	4778
query66	1810	449	337	337
query67	28922	28952	28720	28720
query68	3353	1568	891	891
query69	409	309	263	263
query70	1075	977	970	970
query71	299	222	198	198
query72	2960	2878	2354	2354
query73	837	769	443	443
query74	5104	4963	4802	4802
query75	2565	2552	2185	2185
query76	2329	1208	777	777
query77	351	383	281	281
query78	12540	12309	11935	11935
query79	1282	1155	757	757
query80	550	476	388	388
query81	463	281	241	241
query82	242	154	129	129
query83	367	321	242	242
query84	308	148	117	117
query85	849	528	420	420
query86	332	290	293	290
query87	1857	1830	1768	1768
query88	3648	2756	2717	2717
query89	412	408	335	335
query90	2090	196	181	181
query91	172	159	133	133
query92	65	59	58	58
query93	1427	1440	863	863
query94	534	345	322	322
query95	691	387	439	387
query96	1075	831	363	363
query97	2719	2693	2626	2626
query98	217	209	195	195
query99	1172	1162	1028	1028
Total cold run time: 256419 ms
Total hot run time: 171703 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 25.35 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 3b1db376bc2e21bc7799c6b1011cd56aa8dd60cf, data reload: false

query1	0.01	0.01	0.01
query2	0.10	0.05	0.05
query3	0.26	0.14	0.14
query4	1.61	0.14	0.14
query5	0.27	0.22	0.23
query6	1.23	1.06	1.09
query7	0.04	0.01	0.01
query8	0.06	0.04	0.04
query9	0.37	0.32	0.31
query10	0.54	0.56	0.56
query11	0.19	0.13	0.14
query12	0.18	0.15	0.15
query13	0.48	0.47	0.49
query14	1.01	1.01	1.00
query15	0.62	0.63	0.60
query16	0.32	0.31	0.33
query17	1.08	1.10	1.09
query18	0.23	0.22	0.21
query19	2.07	2.01	1.90
query20	0.02	0.01	0.01
query21	15.41	0.22	0.14
query22	4.70	0.05	0.05
query23	16.12	0.31	0.12
query24	3.04	0.43	0.32
query25	0.10	0.05	0.04
query26	0.72	0.21	0.16
query27	0.04	0.04	0.03
query28	3.59	0.87	0.55
query29	12.47	4.36	3.49
query30	0.29	0.15	0.15
query31	2.76	0.63	0.32
query32	3.22	0.60	0.49
query33	3.18	3.36	3.23
query34	15.52	4.22	3.54
query35	3.52	3.51	3.51
query36	0.55	0.46	0.42
query37	0.09	0.06	0.07
query38	0.06	0.04	0.04
query39	0.04	0.03	0.03
query40	0.18	0.17	0.17
query41	0.08	0.04	0.03
query42	0.03	0.02	0.02
query43	0.06	0.04	0.03
Total cold run time: 96.46 s
Total hot run time: 25.35 s

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 95.05% (96/101) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 64.36% (24689/38361)
Line Coverage 48.05% (256525/533876)
Region Coverage 44.71% (211554/473137)
Branch Coverage 45.88% (92016/200573)

@Mryange

Mryange commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29216 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3b1db376bc2e21bc7799c6b1011cd56aa8dd60cf, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17729	4109	4049	4049
q2	2062	335	189	189
q3	10261	1422	842	842
q4	4682	467	343	343
q5	7501	859	586	586
q6	184	173	136	136
q7	781	834	647	647
q8	9333	1461	1599	1461
q9	5618	4553	4572	4553
q10	6724	1788	1536	1536
q11	444	281	245	245
q12	632	413	295	295
q13	18110	3379	2771	2771
q14	265	262	246	246
q15	q16	786	786	732	732
q17	1045	985	1073	985
q18	7145	5742	5520	5520
q19	1172	1307	1067	1067
q20	496	409	265	265
q21	5544	2591	2450	2450
q22	426	359	298	298
Total cold run time: 100940 ms
Total hot run time: 29216 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4408	4384	4339	4339
q2	326	352	225	225
q3	4574	4948	4411	4411
q4	2081	2153	1377	1377
q5	4465	4345	4363	4345
q6	233	179	131	131
q7	1738	1590	1862	1590
q8	2728	2259	2178	2178
q9	8291	8559	8279	8279
q10	4820	4866	4350	4350
q11	633	428	404	404
q12	743	767	542	542
q13	3322	3724	3015	3015
q14	299	302	283	283
q15	q16	723	726	647	647
q17	1342	1348	1332	1332
q18	8154	7401	7246	7246
q19	1173	1136	1092	1092
q20	2218	2237	1962	1962
q21	5259	4627	4939	4627
q22	517	464	398	398
Total cold run time: 58047 ms
Total hot run time: 52773 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 171764 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3b1db376bc2e21bc7799c6b1011cd56aa8dd60cf, data reload: false

query5	4321	637	483	483
query6	430	189	176	176
query7	4868	557	287	287
query8	326	185	166	166
query9	8777	4025	4067	4025
query10	444	311	272	272
query11	5881	2351	2150	2150
query12	164	113	104	104
query13	1259	630	436	436
query14	6248	5358	4993	4993
query14_1	4315	4301	4297	4297
query15	211	200	184	184
query16	1005	470	437	437
query17	956	744	606	606
query18	2450	493	351	351
query19	206	193	150	150
query20	116	111	108	108
query21	215	153	123	123
query22	13671	13618	13437	13437
query23	17339	16526	16146	16146
query23_1	16221	16342	16207	16207
query24	7612	1774	1311	1311
query24_1	1321	1292	1299	1292
query25	590	467	390	390
query26	1314	322	179	179
query27	2673	536	344	344
query28	4534	2038	2014	2014
query29	1107	629	503	503
query30	314	241	205	205
query31	1122	1082	963	963
query32	103	64	62	62
query33	546	339	271	271
query34	1183	1151	656	656
query35	787	795	679	679
query36	1391	1373	1192	1192
query37	152	106	91	91
query38	1884	1737	1669	1669
query39	932	912	895	895
query39_1	868	900	887	887
query40	214	121	98	98
query41	69	62	61	61
query42	88	87	86	86
query43	321	322	277	277
query44	1415	796	775	775
query45	207	190	176	176
query46	1110	1186	780	780
query47	2377	2301	2264	2264
query48	400	423	291	291
query49	575	426	319	319
query50	977	349	261	261
query51	4448	4449	4288	4288
query52	80	79	69	69
query53	244	271	195	195
query54	262	217	191	191
query55	74	71	66	66
query56	227	234	204	204
query57	1425	1419	1345	1345
query58	252	222	208	208
query59	1586	1673	1431	1431
query60	292	244	235	235
query61	153	146	154	146
query62	698	653	588	588
query63	235	194	189	189
query64	2540	744	607	607
query65	4893	4783	4799	4783
query66	1781	448	334	334
query67	28897	28867	28761	28761
query68	3317	1568	950	950
query69	414	296	264	264
query70	1069	1001	952	952
query71	283	236	209	209
query72	2913	2589	2368	2368
query73	857	767	443	443
query74	5133	4960	4795	4795
query75	2609	2539	2202	2202
query76	2299	1171	810	810
query77	350	414	286	286
query78	12434	12477	11905	11905
query79	1405	1193	734	734
query80	862	460	381	381
query81	496	276	246	246
query82	570	165	120	120
query83	357	279	244	244
query84	328	156	112	112
query85	895	511	407	407
query86	404	289	288	288
query87	1846	1837	1770	1770
query88	3707	2781	2765	2765
query89	446	394	339	339
query90	1872	180	179	179
query91	172	181	133	133
query92	62	65	52	52
query93	1514	1505	873	873
query94	613	345	293	293
query95	668	403	353	353
query96	1089	792	367	367
query97	2693	2685	2560	2560
query98	216	210	197	197
query99	1196	1160	1051	1051
Total cold run time: 257181 ms
Total hot run time: 171764 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 25.26 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 3b1db376bc2e21bc7799c6b1011cd56aa8dd60cf, data reload: false

query1	0.01	0.01	0.01
query2	0.10	0.05	0.06
query3	0.26	0.14	0.13
query4	1.61	0.13	0.13
query5	0.23	0.26	0.21
query6	1.22	1.08	1.09
query7	0.03	0.01	0.01
query8	0.06	0.03	0.04
query9	0.37	0.31	0.35
query10	0.56	0.54	0.54
query11	0.19	0.13	0.15
query12	0.18	0.15	0.15
query13	0.46	0.46	0.48
query14	1.00	0.99	1.02
query15	0.61	0.60	0.60
query16	0.31	0.32	0.31
query17	1.13	1.11	1.13
query18	0.21	0.20	0.21
query19	2.04	1.97	1.98
query20	0.01	0.01	0.02
query21	15.44	0.22	0.15
query22	4.80	0.05	0.05
query23	16.12	0.31	0.12
query24	2.99	0.40	0.31
query25	0.11	0.05	0.04
query26	0.73	0.22	0.14
query27	0.04	0.03	0.04
query28	3.48	0.92	0.52
query29	12.53	4.37	3.46
query30	0.28	0.16	0.16
query31	2.78	0.61	0.31
query32	3.22	0.60	0.48
query33	3.20	3.27	3.21
query34	15.49	4.22	3.52
query35	3.55	3.51	3.54
query36	0.54	0.43	0.44
query37	0.08	0.06	0.06
query38	0.05	0.04	0.04
query39	0.04	0.03	0.03
query40	0.18	0.16	0.16
query41	0.08	0.03	0.03
query42	0.03	0.03	0.03
query43	0.04	0.04	0.04
Total cold run time: 96.39 s
Total hot run time: 25.26 s

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 95.05% (96/101) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 64.31% (24699/38404)
Line Coverage 48.01% (256464/534197)
Region Coverage 44.68% (211487/473307)
Branch Coverage 45.87% (92036/200643)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 95.05% (96/101) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 64.30% (24695/38404)
Line Coverage 48.02% (256509/534197)
Region Coverage 44.63% (211254/473307)
Branch Coverage 45.86% (92013/200643)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 95.05% (96/101) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 64.33% (24706/38404)
Line Coverage 48.05% (256668/534197)
Region Coverage 44.71% (211607/473307)
Branch Coverage 45.90% (92099/200643)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants