columnar: Refactor for later pipeline evolution by JaySon-Huang · Pull Request #10907 · pingcap/tiflash

JaySon-Huang · 2026-06-22T07:52:48Z

What problem does this PR solve?

Issue Number: ref #10844 ref #6233

Problem Summary:

What is changed and how it works?

Refactor: Standardized columnar context usage across execution summary, statistics collection, and table scan components to align columnar reads with the new pipeline.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

None

Summary by CodeRabbit

Summary

New Features
- Added a next-generation columnar read pipeline for remote/disaggregated snapshot streaming, including a dedicated source operator with improved IO profiling (bytes/rows/throughput and read duration).
Refactor
- Standardized columnar scan context usage across execution summary, statistics collection, and table scan components.
Tests
- Added region reader planning tests for bucket parallelism behavior, including boundary splitting and plan flattening.
Chores
- Extended failpoint support and updated ignore rules for review artifacts.

…irectory Move Columnar-specific types out of Flash/Coprocessor and StorageDisaggregatedColumnar into a dedicated Storages/Columnar/ directory for better code organization. Changes: - Move ColumnarScanContext from Flash/Coprocessor/ to Storages/Columnar/ - Extract RNColumnarReader* types (Plan/Work/SharedContext/ReadTask) into Storages/Columnar/ColumnarReader.h/.cpp - Extract RNColumnarInputStream and RNColumnarSourceOp into Storages/Columnar/ColumnarStreams.h/.cpp - Simplify StorageDisaggregatedColumnar.h/.cpp to only contain Storage::readThroughColumnar() and related integration glue code - Update all include paths across Flash/Coprocessor and Flash/Statistics - Add Storages/Columnar/ to CMake build glob - Keep ENABLE_NEXT_GEN_COLUMNAR guards on all Columnar code paths

…mbrella header - Rename all Columnar types to drop the 'RN' prefix: RNColumnarReaderMaterializeState -> ColumnarReaderMaterializeState RNColumnarReaderSharedContext -> ColumnarReaderSharedContext RNColumnarReaderPlan -> ColumnarReaderPlan RNColumnarReaderWork -> ColumnarReaderWork RNColumnarReadTask -> ColumnarReadTask RNColumnarInputStream -> ColumnarInputStream RNColumnarSourceOp -> ColumnarSourceOp getRNColumnarSourceNum -> getColumnarSourceNum - Delete StorageDisaggregatedColumnar.h, now redundant since StorageDisaggregatedColumnar.cpp already directly includes the Columnar/ headers

…SourceOp Separate the pull-based and push-based Columnar execution models into independent files for better maintainability and future pipeline optimization: - ColumnarInputStream.h/.cpp: pull-based IProfilingBlockInputStream model (legacy) - ColumnarSourceOp.h/.cpp: push-based SourceOp pipeline model (future) This allows pipeline-specific optimizations (async IO, adaptive concurrency, backpressure) to be scoped to ColumnarSourceOp without touching the legacy InputStream path.

- Rename ColumnarReadTask -> ColumnarReadTaskPool to clarify it is a pool/broker managing multiple ColumnarReaderWork items, not a single task - Shorten static factory methods: buildColumnarReadTaskPool -> build / buildWithBackoff (class name already provides context) - Inline getColumnarSourceNum at its single call site - Inline 8 trivial getter methods into ColumnarReader.h - Add class-level comments for ColumnarReadTaskPool, ColumnarReaderWork, and ColumnarReaderSharedContext - Add member comments for reader_count, source_num, pending_reader_works - Rename local variable read_columnar_tasks -> columnar_task_pools to match the new type name

Move the four behavioral descriptions (backoff/retry, bucket-level parallelism, work-stealing, prefetching) from the ColumnarReadTaskPool class comment to the specific functions that implement each behavior: - buildWithBackoff: backoff/retry on transient errors - build: bucket-level parallelism - tryAcquireReaderWork: work-stealing from the front of the queue - prefetchPendingWork: async prefetch of next work item

Signed-off-by: JaySon-Huang <tshent@qq.com>

After rebasing the refactoring onto upstream/master, apply the makeLockForDisaggResolve change (from pingcap#10900) to createColumnarReader() which was moved from StorageDisaggregatedColumnar.cpp to ColumnarReader.cpp during the refactoring.

ti-chi-bot · 2026-06-22T07:52:57Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign xuhuaiyu for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-06-22T07:57:41Z

📝 Walkthrough

Walkthrough

Adds next-gen columnar reader infrastructure under Storages/Columnar, rewires disaggregated columnar execution to use it, updates related include paths, and adds plan validation tests plus a new ignore rule.

Changes

Next-gen columnar reader pipeline

Layer / File(s)	Summary
ColumnarScanContext header relocation `dbms/CMakeLists.txt`, `dbms/src/Flash/Coprocessor/DAGContext.h`, `dbms/src/Flash/Coprocessor/ExecutionSummary.*`, `dbms/src/Flash/Statistics/ExecutionSummaryHelper.cpp`, `dbms/src/Flash/Statistics/ExecutorStatisticsCollector.cpp`, `dbms/src/Flash/Statistics/TableScanImpl.cpp`, `dbms/src/Storages/StorageDisaggregatedColumnar.cpp`	Updates the `dbms` build target and replaces `Flash/Coprocessor/ColumnarScanContext*.h` includes with `Storages/Columnar` paths.
ColumnarReader contracts and planning `dbms/src/Storages/Columnar/ColumnarReader.h`, `dbms/src/Storages/Columnar/ColumnarReader.cpp`	Declares the next-gen columnar reader types and implements planning, shared-context serialization, FFI reader creation, retry/backoff, async work scheduling, and input-stream construction.
ColumnarInputStream block streaming `dbms/src/Storages/Columnar/ColumnarInputStream.h`, `dbms/src/Storages/Columnar/ColumnarInputStream.cpp`	Declares and implements `ColumnarInputStream` with reader lifecycle helpers, block deserialization, stats merging, and completion logging.
ColumnarSourceOp pipeline operator `dbms/src/Storages/Columnar/ColumnarSourceOp.h`, `dbms/src/Storages/Columnar/ColumnarSourceOp.cpp`	Declares and implements `ColumnarSourceOp` with prefix/suffix profiling hooks and the `readImpl`/`awaitImpl`/`executeIOImpl` state machine.
StorageDisaggregatedColumnar rewiring `dbms/src/Storages/StorageDisaggregatedColumnar.cpp`	Switches disaggregated columnar execution from RN-prefixed task/operator wiring to `ColumnarReadTaskPool` and `ColumnarSourceOp`.
Planning tests and ignore rule `dbms/src/Storages/Columnar/tests/gtest_region_reader_plan.cpp`, `dbms/src/Common/FailPoint.cpp`, `.gitignore`	Adds bucket-splitting plan tests for `ColumnarReader`, a failpoint for forced bucket keys, and a `.gitignore` entry for generated review files.

Sequence Diagram(s)

sequenceDiagram
  participant StorageDisaggregatedColumnar
  participant ColumnarReadTaskPool
  participant ColumnarReader
  participant ColumnarInputStream
  participant ColumnarSourceOp

  StorageDisaggregatedColumnar->>ColumnarReadTaskPool: buildWithBackoff(...)
  ColumnarReadTaskPool->>ColumnarReader: build reader plans
  ColumnarReadTaskPool->>ColumnarInputStream: getInputStreams()
  StorageDisaggregatedColumnar->>ColumnarSourceOp: create(...)
  ColumnarSourceOp->>ColumnarReadTaskPool: acquire reader work
  ColumnarInputStream->>ColumnarReader: createColumnarReader(...)
  ColumnarReader-->>ColumnarInputStream: reader / error
  ColumnarInputStream-->>ColumnarSourceOp: block data

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

pingcap/tiflash#10871: Adds the FFI surface for region bucket key lookup and shared snapshot cleanup that this PR uses through ColumnarReaderSharedContext and planning.
pingcap/tiflash#10905: Shares the ColumnarScanContext plumbing used here to merge scan statistics into DAG execution state.

Suggested reviewers

JinheLin
Lloyd-Pottiger

Poem

🐇 New columnar paths now softly glow,
Through bucket splits and reader flow.
From Rust to Blocks, the data hops,
With careful stats and tidy stops.
The hare-approved pipeline is in sight —
Next-gen scans are running right!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description follows the template but leaves key fields empty, including the problem summary and all checklist entries.	Fill in the problem summary, add at least one test entry, and complete the side effects, documentation, and release note sections.
Docstring Coverage	⚠️ Warning	Docstring coverage is 4.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title is concise and matches the main theme of the PR: a columnar pipeline refactor.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (1)

dbms/src/Storages/StorageDisaggregatedColumnar.cpp (1)
70-1642: 🧹 Nitpick | 🔵 Trivial

Consider removing the old RN implementation code in a follow-up refactoring.*

The old RN* classes and helper functions (lines 70-1642) are dead code. External verification confirms no usages of RNColumnarReadTask, RNColumnarSourceOp, RNColumnarInputStream, or RNColumnarReaderSharedContext exist outside this file. These ~1,500 lines were superseded by the migration to ColumnarReadTaskPool and ColumnarSourceOp APIs. Removing this dead code would improve maintainability.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 70 - 1642,
Remove the dead code blocks containing the old RN* implementation that has been
superseded by ColumnarReadTaskPool and ColumnarSourceOp APIs. Delete the entire
RNColumnarReaderSharedContext struct definition (including the nested
StartTsClearRegistry class and all member functions), all helper functions that
build and manage RN reader plans (buildColumnarReaderSharedContext,
buildRegionReaderPlansFromPhysicalTableRanges,
buildReaderPlansFromRegionReaderPlans, createColumnarReader, and related
functions), and the complete implementations of RNColumnarReaderWork destructor,
RNColumnarReadTask class with all its methods, RNColumnarInputStream class with
all its methods, and RNColumnarSourceOp class with all its methods. Before
deletion, verify no other code in the repository references these removed
classes and functions by searching for RNColumnarReadTask,
RNColumnarInputStream, RNColumnarSourceOp, RNColumnarReaderSharedContext, and
related identifiers.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dbms/src/Flash/Statistics/TableScanImpl.cpp`:
- Line 20: In the file dbms/src/Storages/StorageDisaggregatedColumnar.cpp at
line 27, replace the stale include path from the old header location
<Flash/Coprocessor/ColumnarScanContext.h> with the new correct path
<Storages/Columnar/ColumnarScanContext.h>. This aligns with the updated include
path being used in other files like TableScanImpl.cpp and resolves the
compilation failure due to the deleted header.

In `@dbms/src/Storages/Columnar/ColumnarReader.cpp`:
- Around line 600-613: The ParseFromString call on lock_info is not checking for
parsing failures when deserializing the error message data. In the LockedError
handling block within the ColumnarReader code, verify that
lock_info.ParseFromString(error_msg) returns successfully before proceeding with
lock resolution. If parsing fails, log an appropriate error and throw an
exception or handle the malformed data case to prevent using an invalid
lock_info object in the subsequent resolveLocks call.
- Around line 540-599: The code calls region_error.ParseFromString(error_msg)
but does not check its return value before accessing fields like
has_epoch_not_match() and current_regions(), which causes undefined behavior if
the protobuf data is malformed. Capture the boolean return value of the
ParseFromString call in the RegionError parsing section and add a check to
verify the parse was successful before proceeding to access any fields on the
region_error object. If parsing fails, log an appropriate error and throw a
RegionException instead of continuing with undefined state.

In `@dbms/src/Storages/Columnar/ColumnarReader.h`:
- Around line 136-143: The catch block in the
clear_shared_snap_access_by_start_ts function call is using a basic LOG_WARNING
call instead of the recommended tryLogCurrentException function to handle
exceptions. Replace the LOG_WARNING statement with a tryLogCurrentException call
that includes a descriptive context message about the failure to clear the
shared snapshot access cache. This will ensure proper exception details are
captured for debugging according to the coding guidelines.

In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Around line 632-642: The method ColumnarReadTaskPool::buildWithBackoff() is
being called but the required header file is missing from the includes section.
Add the include directive for Storages/Columnar/ColumnarReader.h to the includes
section of StorageDisaggregatedColumnar.cpp (around lines 15-61) to resolve the
compilation error for the ColumnarReadTaskPool class.

---

Nitpick comments:
In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Around line 70-1642: Remove the dead code blocks containing the old RN*
implementation that has been superseded by ColumnarReadTaskPool and
ColumnarSourceOp APIs. Delete the entire RNColumnarReaderSharedContext struct
definition (including the nested StartTsClearRegistry class and all member
functions), all helper functions that build and manage RN reader plans
(buildColumnarReaderSharedContext,
buildRegionReaderPlansFromPhysicalTableRanges,
buildReaderPlansFromRegionReaderPlans, createColumnarReader, and related
functions), and the complete implementations of RNColumnarReaderWork destructor,
RNColumnarReadTask class with all its methods, RNColumnarInputStream class with
all its methods, and RNColumnarSourceOp class with all its methods. Before
deletion, verify no other code in the repository references these removed
classes and functions by searching for RNColumnarReadTask,
RNColumnarInputStream, RNColumnarSourceOp, RNColumnarReaderSharedContext, and
related identifiers.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e16d20b2-dd50-4133-b713-3be6abb7e89d

📥 Commits

Reviewing files that changed from the base of the PR and between cc5a473 and 6798c2f.

📒 Files selected for processing (17)

dbms/CMakeLists.txt
dbms/src/Flash/Coprocessor/DAGContext.h
dbms/src/Flash/Coprocessor/ExecutionSummary.cpp
dbms/src/Flash/Coprocessor/ExecutionSummary.h
dbms/src/Flash/Statistics/ExecutionSummaryHelper.cpp
dbms/src/Flash/Statistics/ExecutorStatisticsCollector.cpp
dbms/src/Flash/Statistics/TableScanImpl.cpp
dbms/src/Storages/Columnar/ColumnarInputStream.cpp
dbms/src/Storages/Columnar/ColumnarInputStream.h
dbms/src/Storages/Columnar/ColumnarReader.cpp
dbms/src/Storages/Columnar/ColumnarReader.h
dbms/src/Storages/Columnar/ColumnarScanContext.h
dbms/src/Storages/Columnar/ColumnarScanContext_fwd.h
dbms/src/Storages/Columnar/ColumnarSourceOp.cpp
dbms/src/Storages/Columnar/ColumnarSourceOp.h
dbms/src/Storages/StorageDisaggregatedColumnar.cpp
dbms/src/Storages/StorageDisaggregatedColumnar.h

💤 Files with no reviewable changes (1)

dbms/src/Storages/StorageDisaggregatedColumnar.h

coderabbitai · 2026-06-22T08:05:56Z

+    if (columnar_reader.error_type == ColumnarReaderErrorType::RegionError)
+    {
+        auto error_msg = String(columnar_reader.error.buff.data, columnar_reader.error.buff.len);
+        errorpb::Error region_error;
+        region_error.ParseFromString(error_msg);
+        auto region_ver_id
+            = pingcap::kv::RegionVerID(reader_plan.region_id, reader_plan.region_conf_ver, reader_plan.region_ver);
+        // Refresh region cache and throw an exception for retrying.
+        if (region_error.has_epoch_not_match())
+        {
+            RegionException::UnavailableRegions unavailable_regions;
+            String region_id_ver; // region_id:region_ver:conf_ver
+            for (const auto & region : region_error.epoch_not_match().current_regions())
+            {
+                unavailable_regions.insert(region.id());
+                region_id_ver = std::to_string(region.id()) + ":" + std::to_string(reader_plan.region_ver) + ":"
+                    + std::to_string(region.region_epoch().conf_ver());
+            }
+            auto guard = std::lock_guard(*shared_context.output_lock);
+            cluster->region_cache->dropRegion(region_ver_id);
+            LOG_WARNING(
+                log,
+                "create columnar reader failed region_id={}, epoch not match {}",
+                std::to_string(reader_plan.region_id),
+                region_ver_id.toString());
+            throw RegionException(
+                std::move(unavailable_regions),
+                RegionException::RegionReadStatus::EPOCH_NOT_MATCH,
+                region_id_ver.c_str());
+        }
+        else
+        {
+            RegionException::UnavailableRegions unavailable_regions;
+            auto err_region_id = 0;
+            if (region_error.has_region_not_found())
+            {
+                err_region_id = region_error.region_not_found().region_id();
+                unavailable_regions.insert(err_region_id);
+                LOG_WARNING(
+                    log,
+                    "create columnar reader failed region_id={}, region not found {}",
+                    std::to_string(reader_plan.region_id),
+                    std::to_string(err_region_id));
+            }
+            else
+            {
+                LOG_WARNING(
+                    log,
+                    "create columnar reader failed region_id={}, {}",
+                    std::to_string(reader_plan.region_id),
+                    region_error.ShortDebugString());
+            }
+            auto guard = std::lock_guard(*shared_context.output_lock);
+            cluster->region_cache->dropRegion(region_ver_id);
+            throw RegionException(
+                std::move(unavailable_regions),
+                RegionException::RegionReadStatus::NOT_FOUND,
+                std::to_string(reader_plan.region_id).c_str());
+        }
+    }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Check ParseFromString return value to avoid undefined behavior on malformed data.

If the FFI layer returns malformed protobuf data, ParseFromString returns false and leaves the message object in an undefined state. Accessing fields like has_epoch_not_match() or current_regions() would then produce undefined behavior.

Proposed fix

if (columnar_reader.error_type == ColumnarReaderErrorType::RegionError) { auto error_msg = String(columnar_reader.error.buff.data, columnar_reader.error.buff.len); errorpb::Error region_error; - region_error.ParseFromString(error_msg); + if (!region_error.ParseFromString(error_msg)) + { + LOG_WARNING(log, "failed to parse region error protobuf, region_id={}", reader_plan.region_id); + throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "failed to parse region error for region {}", reader_plan.region_id); + } auto region_ver_id

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dbms/src/Storages/Columnar/ColumnarReader.cpp` around lines 540 - 599, The code calls region_error.ParseFromString(error_msg) but does not check its return value before accessing fields like has_epoch_not_match() and current_regions(), which causes undefined behavior if the protobuf data is malformed. Capture the boolean return value of the ParseFromString call in the RegionError parsing section and add a check to verify the parse was successful before proceeding to access any fields on the region_error object. If parsing fails, log an appropriate error and throw a RegionException instead of continuing with undefined state.

coderabbitai · 2026-06-22T08:05:56Z

+    else if (columnar_reader.error_type == ColumnarReaderErrorType::LockedError)
+    {
+        auto error_msg = String(columnar_reader.error.buff.data, columnar_reader.error.buff.len);
+        kvrpcpb::LockInfo lock_info;
+        lock_info.ParseFromString(error_msg);
+        // Try to resolve locks.
+        pingcap::kv::Backoffer bo(pingcap::kv::copNextMaxBackoff);
+        std::vector<uint64_t> pushed;
+        std::vector<pingcap::kv::LockPtr> locks{makeLockForDisaggResolve(lock_info)};
+        auto guard = std::lock_guard(*shared_context.output_lock);
+        auto before_expired = cluster->lock_resolver->resolveLocks(bo, shared_context.start_ts, locks, pushed);
+        LOG_WARNING(log, "Finished resolve locks, before_expired={}", before_expired);
+        throw Exception("lock error", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR);
+    }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Check ParseFromString return value for lock info.

Same issue as the region error case - parsing could fail on malformed data.

Proposed fix

else if (columnar_reader.error_type == ColumnarReaderErrorType::LockedError) { auto error_msg = String(columnar_reader.error.buff.data, columnar_reader.error.buff.len); kvrpcpb::LockInfo lock_info; - lock_info.ParseFromString(error_msg); + if (!lock_info.ParseFromString(error_msg)) + { + LOG_WARNING(log, "failed to parse lock info protobuf, region_id={}", reader_plan.region_id); + throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "failed to parse lock info for region {}", reader_plan.region_id); + } // Try to resolve locks.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dbms/src/Storages/Columnar/ColumnarReader.cpp` around lines 600 - 613, The ParseFromString call on lock_info is not checking for parsing failures when deserializing the error message data. In the LockedError handling block within the ColumnarReader code, verify that lock_info.ParseFromString(error_msg) returns successfully before proceeding with lock resolution. If parsing fails, log an appropriate error and throw an exception or handle the malformed data case to prevent using an invalid lock_info object in the subsequent resolveLocks call.

coderabbitai · 2026-06-22T08:05:56Z

+        try
+        {
+            clear_shared_snap_access_by_start_ts(start_ts, proxy_ptr);
+        }
+        catch (...)
+        {
+            LOG_WARNING(log, "clear shared snapaccess cache failed, start_ts={}", start_ts);
+        }


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use tryLogCurrentException in catch block per coding guidelines.

The broad catch (...) block should use tryLogCurrentException to capture exception details for debugging, rather than a simple log message.

Proposed fix

try { clear_shared_snap_access_by_start_ts(start_ts, proxy_ptr); } catch (...) { - LOG_WARNING(log, "clear shared snapaccess cache failed, start_ts={}", start_ts); + tryLogCurrentException(log, fmt::format("clear shared snapaccess cache failed, start_ts={}", start_ts)); }

Based on coding guidelines: "Use tryLogCurrentException(log, "context") in broad catch (...) paths to avoid duplicated exception-formatting code"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dbms/src/Storages/Columnar/ColumnarReader.h` around lines 136 - 143, The catch block in the clear_shared_snap_access_by_start_ts function call is using a basic LOG_WARNING call instead of the recommended tryLogCurrentException function to handle exceptions. Replace the LOG_WARNING statement with a tryLogCurrentException call that includes a descriptive context message about the failure to clear the shared snapshot access cache. This will ensure proper exception details are captured for debugging according to the coding guidelines.

Source: Coding guidelines

The rebase onto upstream/master left old code (RNColumnarReaderSharedContext, getRNColumnarSourceNum, and commented-out Columnar type implementations) in StorageDisaggregatedColumnar.cpp. This file should only contain the StorageDisaggregated integration glue methods (readThroughColumnar, filterConditionsWithPushedDownFilters) after the refactoring. Restore the correct stripped-down version.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

dbms/src/Storages/StorageDisaggregatedColumnar.cpp (1)
124-147: 🩺 Stability & Availability | 🟠 Major

Add guard before pipeline.firstStream() to match the pipeline overload pattern.

The first readThroughColumnar overload (lines 110–147) unconditionally calls pipeline.firstStream()->getHeader() at line 133. However, buildWithBackoff() can return an empty columnar_task_pools vector (when all_reader_plans.empty() at line 1033 in ColumnarReader.cpp), which leaves pipeline.streams empty. Calling firstStream() on an empty pipeline uses .at(0) internally and will throw std::out_of_range.

The second readThroughColumnar overload correctly guards this with if (!columnar_task_pools.empty()) at line 167. Add the same guard to the first overload before accessing the pipeline.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 124 - 147,
The first readThroughColumnar overload calls pipeline.firstStream()->getHeader()
without checking if the pipeline is empty, which will throw std::out_of_range
when columnar_task_pools is empty. Add a guard check if
(!columnar_task_pools.empty()) wrapping the block that accesses the pipeline
header (from retrieving the stream header and extracting source columns through
to the DAGExpressionAnalyzer construction), matching the same guard pattern
already present in the second readThroughColumnar overload at line 167.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Around line 124-147: The first readThroughColumnar overload calls
pipeline.firstStream()->getHeader() without checking if the pipeline is empty,
which will throw std::out_of_range when columnar_task_pools is empty. Add a
guard check if (!columnar_task_pools.empty()) wrapping the block that accesses
the pipeline header (from retrieving the stream header and extracting source
columns through to the DAGExpressionAnalyzer construction), matching the same
guard pattern already present in the second readThroughColumnar overload at line
167.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b7dccf38-fe86-494e-ae9e-c9790298251a

📥 Commits

Reviewing files that changed from the base of the PR and between 6798c2f and 77635e4.

📒 Files selected for processing (5)

dbms/src/Storages/Columnar/ColumnarInputStream.cpp
dbms/src/Storages/Columnar/ColumnarInputStream.h
dbms/src/Storages/Columnar/ColumnarSourceOp.cpp
dbms/src/Storages/Columnar/ColumnarSourceOp.h
dbms/src/Storages/StorageDisaggregatedColumnar.cpp

🚧 Files skipped from review as they are similar to previous changes (4)

dbms/src/Storages/Columnar/ColumnarSourceOp.cpp
dbms/src/Storages/Columnar/ColumnarSourceOp.h
dbms/src/Storages/Columnar/ColumnarInputStream.h
dbms/src/Storages/Columnar/ColumnarInputStream.cpp

JaySon-Huang · 2026-06-23T09:20:26Z

/retest

Signed-off-by: JaySon-Huang <tshent@qq.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dbms/src/Storages/Columnar/tests/gtest_region_reader_plan.cpp`:
- Around line 266-271: The test in gtest_region_reader_plan should guard indexed
access to region_reader_plans by asserting the vector size before using
region_reader_plans[0]. Update the assertions around
buildColumnarRegionReaderPlans so that planned_reader_num and the actual
region_reader_plans container size are checked consistently, then only access
the first element after the size check. Use the buildColumnarRegionReaderPlans
output and the region_reader_plans field to locate and fix the unsafe access.
- Around line 299-317: The region-plan test only checks the two expected region
IDs and can still pass if extra plans are returned, so tighten validation in
gtest_region_reader_plan.cpp. In the loop over output.region_reader_plans, keep
the existing checks for region_id 1000 and 2000, but add a fail-fast assertion
for any other region_id, and assert that output.region_reader_plans has exactly
the expected number of entries before or after the loop. Use the existing
region_reader_plans, bucket_split_region_count, and region_id checks to locate
and update the validation logic.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 4aa282af-11a9-4cd3-a5f3-dbb32028b265

📥 Commits

Reviewing files that changed from the base of the PR and between 77635e4 and a32eb92.

📒 Files selected for processing (2)

.gitignore
dbms/src/Storages/Columnar/tests/gtest_region_reader_plan.cpp

coderabbitai · 2026-06-24T08:47:36Z

+    const auto output = buildColumnarRegionReaderPlans(*context, all_remote_regions_by_region, region_ver_ids, true);
+
+    ASSERT_EQ(output.planned_reader_num, 1U);
+    ASSERT_EQ(output.total_split_bucket_num, 0U);
+    EXPECT_TRUE(output.region_reader_plans[0].bucket_units.empty());
+}


🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Assert plan vector size before indexed access.

Line 270 indexes region_reader_plans[0] without an ASSERT_EQ(size, 1U) guard, so regressions can become undefined access instead of a clean test failure.

Suggested patch

const auto output = buildColumnarRegionReaderPlans(*context, all_remote_regions_by_region, region_ver_ids, true); ASSERT_EQ(output.planned_reader_num, 1U); ASSERT_EQ(output.total_split_bucket_num, 0U); + ASSERT_EQ(output.region_reader_plans.size(), 1U); EXPECT_TRUE(output.region_reader_plans[0].bucket_units.empty());

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const auto output = buildColumnarRegionReaderPlans(*context, all_remote_regions_by_region, region_ver_ids, true);

ASSERT_EQ(output.planned_reader_num, 1U);

ASSERT_EQ(output.total_split_bucket_num, 0U);

EXPECT_TRUE(output.region_reader_plans[0].bucket_units.empty());

}

const auto output = buildColumnarRegionReaderPlans(*context, all_remote_regions_by_region, region_ver_ids, true);

ASSERT_EQ(output.planned_reader_num, 1U);

ASSERT_EQ(output.total_split_bucket_num, 0U);

ASSERT_EQ(output.region_reader_plans.size(), 1U);

EXPECT_TRUE(output.region_reader_plans[0].bucket_units.empty());

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dbms/src/Storages/Columnar/tests/gtest_region_reader_plan.cpp` around lines 266 - 271, The test in gtest_region_reader_plan should guard indexed access to region_reader_plans by asserting the vector size before using region_reader_plans[0]. Update the assertions around buildColumnarRegionReaderPlans so that planned_reader_num and the actual region_reader_plans container size are checked consistently, then only access the first element after the size check. Use the buildColumnarRegionReaderPlans output and the region_reader_plans field to locate and fix the unsafe access.

coderabbitai · 2026-06-24T08:47:36Z

+    ASSERT_EQ(output.planned_reader_num, 4U);
+    ASSERT_EQ(output.total_split_bucket_num, 3U);
+
+    size_t bucket_split_region_count = 0;
+    for (const auto & plan : output.region_reader_plans)
+    {
+        if (plan.region_id == 1000)
+        {
+            ASSERT_EQ(plan.bucket_units.size(), 3U);
+            ++bucket_split_region_count;
+        }
+        else if (plan.region_id == 2000)
+        {
+            EXPECT_TRUE(plan.bucket_units.empty());
+            ++bucket_split_region_count;
+        }
+    }
+    EXPECT_EQ(bucket_split_region_count, 2U);
+}


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Make region-plan validation strict for unexpected IDs.

This test can still pass if an extra unexpected region plan is returned. Add a size assertion and fail fast on unknown IDs.

Suggested patch

ASSERT_EQ(output.planned_reader_num, 4U); ASSERT_EQ(output.total_split_bucket_num, 3U); + ASSERT_EQ(output.region_reader_plans.size(), 2U); size_t bucket_split_region_count = 0; for (const auto & plan : output.region_reader_plans) { if (plan.region_id == 1000) @@ else if (plan.region_id == 2000) { EXPECT_TRUE(plan.bucket_units.empty()); ++bucket_split_region_count; } + else + { + FAIL() << "Unexpected region_id in region_reader_plans: " << plan.region_id; + } } EXPECT_EQ(bucket_split_region_count, 2U);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

ASSERT_EQ(output.planned_reader_num, 4U);

ASSERT_EQ(output.total_split_bucket_num, 3U);

size_t bucket_split_region_count = 0;

for (const auto & plan : output.region_reader_plans)

{

if (plan.region_id == 1000)

{

ASSERT_EQ(plan.bucket_units.size(), 3U);

++bucket_split_region_count;

}

else if (plan.region_id == 2000)

{

EXPECT_TRUE(plan.bucket_units.empty());

++bucket_split_region_count;

}

}

EXPECT_EQ(bucket_split_region_count, 2U);

}

ASSERT_EQ(output.planned_reader_num, 4U);

ASSERT_EQ(output.total_split_bucket_num, 3U);

ASSERT_EQ(output.region_reader_plans.size(), 2U);

size_t bucket_split_region_count = 0;

for (const auto & plan : output.region_reader_plans)

{

if (plan.region_id == 1000)

{

ASSERT_EQ(plan.bucket_units.size(), 3U);

+bucket_split_region_count;

}

else if (plan.region_id == 2000)

{

EXPECT_TRUE(plan.bucket_units.empty());

+bucket_split_region_count;

}

else

{

FAIL() << "Unexpected region_id in region_reader_plans: " << plan.region_id;

}

}

EXPECT_EQ(bucket_split_region_count, 2U);

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dbms/src/Storages/Columnar/tests/gtest_region_reader_plan.cpp` around lines 299 - 317, The region-plan test only checks the two expected region IDs and can still pass if extra plans are returned, so tighten validation in gtest_region_reader_plan.cpp. In the loop over output.region_reader_plans, keep the existing checks for region_id 1000 and 2000, but add a fail-fast assertion for any other region_id, and assert that output.region_reader_plans has exactly the expected number of entries before or after the loop. Use the existing region_reader_plans, bucket_split_region_count, and region_id checks to locate and update the validation logic.

Signed-off-by: JaySon-Huang <tshent@qq.com>

coderabbitai

🧹 Nitpick comments (1)

dbms/src/Storages/Columnar/ColumnarReader.h (1)
295-329: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Align the test-visible API with the style rules. The new BucketSplitResult, ColumnarRegionReaderPlan, and ColumnarRegionReaderPlansOutput fields still use snake_case, size_t, and const String & bucket_key; rename them to camelCase, use UInt64 for the counters, and take bucketKey as std::string_view. Mirror the same renames in dbms/src/Storages/Columnar/ColumnarReader.cpp and dbms/src/Storages/Columnar/tests/gtest_region_reader_plan.cpp.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/Columnar/ColumnarReader.h` around lines 295 - 329, The
test-visible API in ColumnarReader still uses nonconforming names and types;
update BucketSplitResult, ColumnarRegionReaderPlan, and
ColumnarRegionReaderPlansOutput to camelCase fields, switch the counter fields
from size_t to UInt64, and change isBucketBoundaryInsideRange to take bucketKey
as std::string_view. Propagate the same renames and type changes through
ColumnarReader.cpp and gtest_region_reader_plan.cpp, including all field
accesses and any helper signatures that consume these structs.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@dbms/src/Storages/Columnar/ColumnarReader.h`:
- Around line 295-329: The test-visible API in ColumnarReader still uses
nonconforming names and types; update BucketSplitResult,
ColumnarRegionReaderPlan, and ColumnarRegionReaderPlansOutput to camelCase
fields, switch the counter fields from size_t to UInt64, and change
isBucketBoundaryInsideRange to take bucketKey as std::string_view. Propagate the
same renames and type changes through ColumnarReader.cpp and
gtest_region_reader_plan.cpp, including all field accesses and any helper
signatures that consume these structs.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 8c1ead0e-45c9-4f35-ae05-a0e3a343870f

📥 Commits

Reviewing files that changed from the base of the PR and between a32eb92 and 5b07925.

📒 Files selected for processing (4)

dbms/src/Common/FailPoint.cpp
dbms/src/Storages/Columnar/ColumnarReader.cpp
dbms/src/Storages/Columnar/ColumnarReader.h
dbms/src/Storages/Columnar/tests/gtest_region_reader_plan.cpp

🚧 Files skipped from review as they are similar to previous changes (2)

dbms/src/Storages/Columnar/tests/gtest_region_reader_plan.cpp
dbms/src/Storages/Columnar/ColumnarReader.cpp

ti-chi-bot · 2026-06-27T07:21:21Z

@JaySon-Huang: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-sanitizer-tsan	`5b07925`	link	false	`/test pull-sanitizer-tsan`
pull-sanitizer-asan	`5b07925`	link	false	`/test pull-sanitizer-asan`
pull-integration-next-gen-columnar	`5b07925`	link	true	`/test pull-integration-next-gen-columnar`
pull-integration-test	`5b07925`	link	true	`/test pull-integration-test`
pull-unit-test	`5b07925`	link	true	`/test pull-unit-test`
pull-unit-next-gen	`5b07925`	link	true	`/test pull-unit-next-gen`
pull-integration-next-gen	`5b07925`	link	true	`/test pull-integration-next-gen`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

JaySon-Huang added 7 commits June 22, 2026 15:47

Format codes

cda7f7b

Signed-off-by: JaySon-Huang <tshent@qq.com>

ti-chi-bot Bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. labels Jun 22, 2026

ti-chi-bot Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jun 22, 2026

coderabbitai Bot reviewed Jun 22, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 23, 2026

View reviewed changes

ti-chi-bot Bot removed the do-not-merge/needs-linked-issue label Jun 24, 2026

Add unit tests

a32eb92

Signed-off-by: JaySon-Huang <tshent@qq.com>

coderabbitai Bot reviewed Jun 24, 2026

View reviewed changes

try add ut

5b07925

Signed-off-by: JaySon-Huang <tshent@qq.com>

coderabbitai Bot reviewed Jun 27, 2026

View reviewed changes

Uh oh!

Conversation

JaySon-Huang commented Jun 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What is changed and how it works?

Check List

Release note

Summary by CodeRabbit

Summary

Uh oh!

ti-chi-bot Bot commented Jun 22, 2026

Uh oh!

coderabbitai Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

JaySon-Huang commented Jun 23, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot Bot commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JaySon-Huang commented Jun 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading