Skip to content

[refine](column) separate mutable subcolumn mutation from read-only traversal#64905

Open
Mryange wants to merge 1 commit into
apache:masterfrom
Mryange:column-subcolumn-cow-refine
Open

[refine](column) separate mutable subcolumn mutation from read-only traversal#64905
Mryange wants to merge 1 commit into
apache:masterfrom
Mryange:column-subcolumn-cow-refine

Conversation

@Mryange

@Mryange Mryange commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Column wrappers used for_each_subcolumn for both read-only traversal and mutable subcolumn detachment during COW mutation. This mixed two different contracts in one callback API and forced typed subcolumns such as ColumnNullable::_null_map, ColumnArray::offsets, and ColumnMap::offsets_column to move through temporary base IColumn::WrappedPtr bridges.

Root cause: the mutable callback accepted IColumn::WrappedPtr&, but several subcolumns are stored as strongly typed wrappers. Binding typed wrapper references to the base wrapper callback is unsafe, so each implementation needed ad-hoc move/defer/cast code.

This PR keeps for_each_subcolumn as a const read-only traversal API and adds mutate_subcolumns() for the COW mutation path. Common mutate_subcolumn helpers handle generic and strongly typed subcolumns, so ColumnArray, ColumnMap, ColumnNullable, ColumnStruct, and ColumnVariant can detach children without exposing a mutable traversal callback. The added BEUT cases also verify that mutating exclusive subcolumns does not introduce an extra copy.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Mryange

Mryange commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: request changes for one test coverage gap in the core column COW refactor.

Critical checkpoint conclusions:

  • Goal: the PR separates read-only subcolumn traversal from recursive COW mutation. The code mostly accomplishes that by keeping for_each_subcolumn const-only and routing IColumn::mutate through mutate_subcolumns().
  • Scope/focus: the source change is small and focused on BE column wrappers, with tests added for array and nullable COW behavior.
  • Concurrency/lifecycle: no new threads, locks, static initialization, persistence, transaction, or configuration behavior are introduced.
  • Compatibility/protocol: no serialization format, FE/BE protocol, or function symbol compatibility change found.
  • Parallel paths: array, nullable, map, struct, const, and variant wrapper paths were checked. The remaining issue is that the distinct map enumeration path lacks focused mutation coverage.
  • Error handling/memory: no ignored Status or new memory ownership issue found in the changed code.
  • Tests: git diff --check is clean for the PR file set. Formatter/style CI is green. I could not run BE UT locally because thirdparty/installed is absent in this runner. The current macOS BE-UT CI job fails before compiling Doris with ERROR: The JAVA version is 25, it must be JDK-17, so I did not treat that as PR-caused.

User focus: no additional user-provided focus points were present.

Subagent conclusions: optimizer-rewrite reported no candidates. tests-session-config proposed TEST-1; I verified and accepted the ColumnMap portion as MAIN-1 for an inline comment. Convergence round 1 ended with both live subagents reporting NO_NEW_VALUABLE_FINDINGS for the same ledger and one-comment final set.

callback(keys_column);
callback(values_column);
callback(offsets);
void mutate_subcolumns() override {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This COW refactor changes ColumnMap from the old callback/defer path to a class-specific mutate_subcolumns() that must enumerate all three children (keys_column, values_column, and typed offsets_column). The new tests cover array and nullable, but there is no focused test that mutates a shared/exclusive ColumnMap and verifies all three child pointers are detached or preserved correctly. A missed or wrong entry in this method would compile and leave one map child aliased across mutation, so please add a BE unit test analogous to the new array/nullable cases for ColumnMap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants