Skip to content

columnar: Refactor for later pipeline evolution#10907

Open
JaySon-Huang wants to merge 10 commits into
pingcap:masterfrom
JaySon-Huang:refactor_for_pipeline
Open

columnar: Refactor for later pipeline evolution#10907
JaySon-Huang wants to merge 10 commits into
pingcap:masterfrom
JaySon-Huang:refactor_for_pipeline

Conversation

@JaySon-Huang

@JaySon-Huang JaySon-Huang commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: ref #10844 ref #6233

Problem Summary:

What is changed and how it works?

Refactor: Standardized columnar context usage across execution summary, statistics collection, and table scan components to align columnar reads with the new pipeline.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Summary by CodeRabbit

Summary

  • New Features
    • Added a next-generation columnar read pipeline for remote/disaggregated snapshot streaming, including a dedicated source operator with improved IO profiling (bytes/rows/throughput and read duration).
  • Refactor
    • Standardized columnar scan context usage across execution summary, statistics collection, and table scan components.
  • Tests
    • Added region reader planning tests for bucket parallelism behavior, including boundary splitting and plan flattening.
  • Chores
    • Extended failpoint support and updated ignore rules for review artifacts.

…irectory

Move Columnar-specific types out of Flash/Coprocessor and
StorageDisaggregatedColumnar into a dedicated Storages/Columnar/ directory
for better code organization.

Changes:
- Move ColumnarScanContext from Flash/Coprocessor/ to Storages/Columnar/
- Extract RNColumnarReader* types (Plan/Work/SharedContext/ReadTask) into
  Storages/Columnar/ColumnarReader.h/.cpp
- Extract RNColumnarInputStream and RNColumnarSourceOp into
  Storages/Columnar/ColumnarStreams.h/.cpp
- Simplify StorageDisaggregatedColumnar.h/.cpp to only contain
  Storage::readThroughColumnar() and related integration glue code
- Update all include paths across Flash/Coprocessor and Flash/Statistics
- Add Storages/Columnar/ to CMake build glob
- Keep ENABLE_NEXT_GEN_COLUMNAR guards on all Columnar code paths
…mbrella header

- Rename all Columnar types to drop the 'RN' prefix:
  RNColumnarReaderMaterializeState -> ColumnarReaderMaterializeState
  RNColumnarReaderSharedContext   -> ColumnarReaderSharedContext
  RNColumnarReaderPlan            -> ColumnarReaderPlan
  RNColumnarReaderWork            -> ColumnarReaderWork
  RNColumnarReadTask              -> ColumnarReadTask
  RNColumnarInputStream           -> ColumnarInputStream
  RNColumnarSourceOp              -> ColumnarSourceOp
  getRNColumnarSourceNum          -> getColumnarSourceNum

- Delete StorageDisaggregatedColumnar.h, now redundant since
  StorageDisaggregatedColumnar.cpp already directly includes
  the Columnar/ headers
…SourceOp

Separate the pull-based and push-based Columnar execution models
into independent files for better maintainability and future pipeline
optimization:

- ColumnarInputStream.h/.cpp: pull-based IProfilingBlockInputStream model (legacy)
- ColumnarSourceOp.h/.cpp: push-based SourceOp pipeline model (future)

This allows pipeline-specific optimizations (async IO, adaptive
concurrency, backpressure) to be scoped to ColumnarSourceOp without
touching the legacy InputStream path.
- Rename ColumnarReadTask -> ColumnarReadTaskPool to clarify it is
  a pool/broker managing multiple ColumnarReaderWork items, not a
  single task
- Shorten static factory methods: buildColumnarReadTaskPool ->
  build / buildWithBackoff (class name already provides context)
- Inline getColumnarSourceNum at its single call site
- Inline 8 trivial getter methods into ColumnarReader.h
- Add class-level comments for ColumnarReadTaskPool,
  ColumnarReaderWork, and ColumnarReaderSharedContext
- Add member comments for reader_count, source_num,
  pending_reader_works
- Rename local variable read_columnar_tasks -> columnar_task_pools
  to match the new type name
Move the four behavioral descriptions (backoff/retry, bucket-level
parallelism, work-stealing, prefetching) from the ColumnarReadTaskPool
class comment to the specific functions that implement each behavior:

- buildWithBackoff: backoff/retry on transient errors
- build: bucket-level parallelism
- tryAcquireReaderWork: work-stealing from the front of the queue
- prefetchPendingWork: async prefetch of next work item
Signed-off-by: JaySon-Huang <tshent@qq.com>
After rebasing the refactoring onto upstream/master, apply the
makeLockForDisaggResolve change (from pingcap#10900) to createColumnarReader()
which was moved from StorageDisaggregatedColumnar.cpp to
ColumnarReader.cpp during the refactoring.
@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. labels Jun 22, 2026
@ti-chi-bot

ti-chi-bot Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign xuhuaiyu for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jun 22, 2026
@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds next-gen columnar reader infrastructure under Storages/Columnar, rewires disaggregated columnar execution to use it, updates related include paths, and adds plan validation tests plus a new ignore rule.

Changes

Next-gen columnar reader pipeline

Layer / File(s) Summary
ColumnarScanContext header relocation
dbms/CMakeLists.txt, dbms/src/Flash/Coprocessor/DAGContext.h, dbms/src/Flash/Coprocessor/ExecutionSummary.*, dbms/src/Flash/Statistics/ExecutionSummaryHelper.cpp, dbms/src/Flash/Statistics/ExecutorStatisticsCollector.cpp, dbms/src/Flash/Statistics/TableScanImpl.cpp, dbms/src/Storages/StorageDisaggregatedColumnar.cpp
Updates the dbms build target and replaces Flash/Coprocessor/ColumnarScanContext*.h includes with Storages/Columnar paths.
ColumnarReader contracts and planning
dbms/src/Storages/Columnar/ColumnarReader.h, dbms/src/Storages/Columnar/ColumnarReader.cpp
Declares the next-gen columnar reader types and implements planning, shared-context serialization, FFI reader creation, retry/backoff, async work scheduling, and input-stream construction.
ColumnarInputStream block streaming
dbms/src/Storages/Columnar/ColumnarInputStream.h, dbms/src/Storages/Columnar/ColumnarInputStream.cpp
Declares and implements ColumnarInputStream with reader lifecycle helpers, block deserialization, stats merging, and completion logging.
ColumnarSourceOp pipeline operator
dbms/src/Storages/Columnar/ColumnarSourceOp.h, dbms/src/Storages/Columnar/ColumnarSourceOp.cpp
Declares and implements ColumnarSourceOp with prefix/suffix profiling hooks and the readImpl/awaitImpl/executeIOImpl state machine.
StorageDisaggregatedColumnar rewiring
dbms/src/Storages/StorageDisaggregatedColumnar.cpp
Switches disaggregated columnar execution from RN-prefixed task/operator wiring to ColumnarReadTaskPool and ColumnarSourceOp.
Planning tests and ignore rule
dbms/src/Storages/Columnar/tests/gtest_region_reader_plan.cpp, dbms/src/Common/FailPoint.cpp, .gitignore
Adds bucket-splitting plan tests for ColumnarReader, a failpoint for forced bucket keys, and a .gitignore entry for generated review files.

Sequence Diagram(s)

sequenceDiagram
  participant StorageDisaggregatedColumnar
  participant ColumnarReadTaskPool
  participant ColumnarReader
  participant ColumnarInputStream
  participant ColumnarSourceOp

  StorageDisaggregatedColumnar->>ColumnarReadTaskPool: buildWithBackoff(...)
  ColumnarReadTaskPool->>ColumnarReader: build reader plans
  ColumnarReadTaskPool->>ColumnarInputStream: getInputStreams()
  StorageDisaggregatedColumnar->>ColumnarSourceOp: create(...)
  ColumnarSourceOp->>ColumnarReadTaskPool: acquire reader work
  ColumnarInputStream->>ColumnarReader: createColumnarReader(...)
  ColumnarReader-->>ColumnarInputStream: reader / error
  ColumnarInputStream-->>ColumnarSourceOp: block data
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • pingcap/tiflash#10871: Adds the FFI surface for region bucket key lookup and shared snapshot cleanup that this PR uses through ColumnarReaderSharedContext and planning.
  • pingcap/tiflash#10905: Shares the ColumnarScanContext plumbing used here to merge scan statistics into DAG execution state.

Suggested reviewers

  • JinheLin
  • Lloyd-Pottiger

Poem

🐇 New columnar paths now softly glow,
Through bucket splits and reader flow.
From Rust to Blocks, the data hops,
With careful stats and tidy stops.
The hare-approved pipeline is in sight —
Next-gen scans are running right!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description follows the template but leaves key fields empty, including the problem summary and all checklist entries. Fill in the problem summary, add at least one test entry, and complete the side effects, documentation, and release note sections.
Docstring Coverage ⚠️ Warning Docstring coverage is 4.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and matches the main theme of the PR: a columnar pipeline refactor.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
dbms/src/Storages/StorageDisaggregatedColumnar.cpp (1)

70-1642: 🧹 Nitpick | 🔵 Trivial

Consider removing the old RN implementation code in a follow-up refactoring.*

The old RN* classes and helper functions (lines 70-1642) are dead code. External verification confirms no usages of RNColumnarReadTask, RNColumnarSourceOp, RNColumnarInputStream, or RNColumnarReaderSharedContext exist outside this file. These ~1,500 lines were superseded by the migration to ColumnarReadTaskPool and ColumnarSourceOp APIs. Removing this dead code would improve maintainability.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 70 - 1642,
Remove the dead code blocks containing the old RN* implementation that has been
superseded by ColumnarReadTaskPool and ColumnarSourceOp APIs. Delete the entire
RNColumnarReaderSharedContext struct definition (including the nested
StartTsClearRegistry class and all member functions), all helper functions that
build and manage RN reader plans (buildColumnarReaderSharedContext,
buildRegionReaderPlansFromPhysicalTableRanges,
buildReaderPlansFromRegionReaderPlans, createColumnarReader, and related
functions), and the complete implementations of RNColumnarReaderWork destructor,
RNColumnarReadTask class with all its methods, RNColumnarInputStream class with
all its methods, and RNColumnarSourceOp class with all its methods. Before
deletion, verify no other code in the repository references these removed
classes and functions by searching for RNColumnarReadTask,
RNColumnarInputStream, RNColumnarSourceOp, RNColumnarReaderSharedContext, and
related identifiers.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dbms/src/Flash/Statistics/TableScanImpl.cpp`:
- Line 20: In the file dbms/src/Storages/StorageDisaggregatedColumnar.cpp at
line 27, replace the stale include path from the old header location
<Flash/Coprocessor/ColumnarScanContext.h> with the new correct path
<Storages/Columnar/ColumnarScanContext.h>. This aligns with the updated include
path being used in other files like TableScanImpl.cpp and resolves the
compilation failure due to the deleted header.

In `@dbms/src/Storages/Columnar/ColumnarReader.cpp`:
- Around line 600-613: The ParseFromString call on lock_info is not checking for
parsing failures when deserializing the error message data. In the LockedError
handling block within the ColumnarReader code, verify that
lock_info.ParseFromString(error_msg) returns successfully before proceeding with
lock resolution. If parsing fails, log an appropriate error and throw an
exception or handle the malformed data case to prevent using an invalid
lock_info object in the subsequent resolveLocks call.
- Around line 540-599: The code calls region_error.ParseFromString(error_msg)
but does not check its return value before accessing fields like
has_epoch_not_match() and current_regions(), which causes undefined behavior if
the protobuf data is malformed. Capture the boolean return value of the
ParseFromString call in the RegionError parsing section and add a check to
verify the parse was successful before proceeding to access any fields on the
region_error object. If parsing fails, log an appropriate error and throw a
RegionException instead of continuing with undefined state.

In `@dbms/src/Storages/Columnar/ColumnarReader.h`:
- Around line 136-143: The catch block in the
clear_shared_snap_access_by_start_ts function call is using a basic LOG_WARNING
call instead of the recommended tryLogCurrentException function to handle
exceptions. Replace the LOG_WARNING statement with a tryLogCurrentException call
that includes a descriptive context message about the failure to clear the
shared snapshot access cache. This will ensure proper exception details are
captured for debugging according to the coding guidelines.

In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Around line 632-642: The method ColumnarReadTaskPool::buildWithBackoff() is
being called but the required header file is missing from the includes section.
Add the include directive for Storages/Columnar/ColumnarReader.h to the includes
section of StorageDisaggregatedColumnar.cpp (around lines 15-61) to resolve the
compilation error for the ColumnarReadTaskPool class.

---

Nitpick comments:
In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Around line 70-1642: Remove the dead code blocks containing the old RN*
implementation that has been superseded by ColumnarReadTaskPool and
ColumnarSourceOp APIs. Delete the entire RNColumnarReaderSharedContext struct
definition (including the nested StartTsClearRegistry class and all member
functions), all helper functions that build and manage RN reader plans
(buildColumnarReaderSharedContext,
buildRegionReaderPlansFromPhysicalTableRanges,
buildReaderPlansFromRegionReaderPlans, createColumnarReader, and related
functions), and the complete implementations of RNColumnarReaderWork destructor,
RNColumnarReadTask class with all its methods, RNColumnarInputStream class with
all its methods, and RNColumnarSourceOp class with all its methods. Before
deletion, verify no other code in the repository references these removed
classes and functions by searching for RNColumnarReadTask,
RNColumnarInputStream, RNColumnarSourceOp, RNColumnarReaderSharedContext, and
related identifiers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e16d20b2-dd50-4133-b713-3be6abb7e89d

📥 Commits

Reviewing files that changed from the base of the PR and between cc5a473 and 6798c2f.

📒 Files selected for processing (17)
  • dbms/CMakeLists.txt
  • dbms/src/Flash/Coprocessor/DAGContext.h
  • dbms/src/Flash/Coprocessor/ExecutionSummary.cpp
  • dbms/src/Flash/Coprocessor/ExecutionSummary.h
  • dbms/src/Flash/Statistics/ExecutionSummaryHelper.cpp
  • dbms/src/Flash/Statistics/ExecutorStatisticsCollector.cpp
  • dbms/src/Flash/Statistics/TableScanImpl.cpp
  • dbms/src/Storages/Columnar/ColumnarInputStream.cpp
  • dbms/src/Storages/Columnar/ColumnarInputStream.h
  • dbms/src/Storages/Columnar/ColumnarReader.cpp
  • dbms/src/Storages/Columnar/ColumnarReader.h
  • dbms/src/Storages/Columnar/ColumnarScanContext.h
  • dbms/src/Storages/Columnar/ColumnarScanContext_fwd.h
  • dbms/src/Storages/Columnar/ColumnarSourceOp.cpp
  • dbms/src/Storages/Columnar/ColumnarSourceOp.h
  • dbms/src/Storages/StorageDisaggregatedColumnar.cpp
  • dbms/src/Storages/StorageDisaggregatedColumnar.h
💤 Files with no reviewable changes (1)
  • dbms/src/Storages/StorageDisaggregatedColumnar.h

Comment thread dbms/src/Flash/Statistics/TableScanImpl.cpp
Comment on lines +540 to +599
if (columnar_reader.error_type == ColumnarReaderErrorType::RegionError)
{
auto error_msg = String(columnar_reader.error.buff.data, columnar_reader.error.buff.len);
errorpb::Error region_error;
region_error.ParseFromString(error_msg);
auto region_ver_id
= pingcap::kv::RegionVerID(reader_plan.region_id, reader_plan.region_conf_ver, reader_plan.region_ver);
// Refresh region cache and throw an exception for retrying.
if (region_error.has_epoch_not_match())
{
RegionException::UnavailableRegions unavailable_regions;
String region_id_ver; // region_id:region_ver:conf_ver
for (const auto & region : region_error.epoch_not_match().current_regions())
{
unavailable_regions.insert(region.id());
region_id_ver = std::to_string(region.id()) + ":" + std::to_string(reader_plan.region_ver) + ":"
+ std::to_string(region.region_epoch().conf_ver());
}
auto guard = std::lock_guard(*shared_context.output_lock);
cluster->region_cache->dropRegion(region_ver_id);
LOG_WARNING(
log,
"create columnar reader failed region_id={}, epoch not match {}",
std::to_string(reader_plan.region_id),
region_ver_id.toString());
throw RegionException(
std::move(unavailable_regions),
RegionException::RegionReadStatus::EPOCH_NOT_MATCH,
region_id_ver.c_str());
}
else
{
RegionException::UnavailableRegions unavailable_regions;
auto err_region_id = 0;
if (region_error.has_region_not_found())
{
err_region_id = region_error.region_not_found().region_id();
unavailable_regions.insert(err_region_id);
LOG_WARNING(
log,
"create columnar reader failed region_id={}, region not found {}",
std::to_string(reader_plan.region_id),
std::to_string(err_region_id));
}
else
{
LOG_WARNING(
log,
"create columnar reader failed region_id={}, {}",
std::to_string(reader_plan.region_id),
region_error.ShortDebugString());
}
auto guard = std::lock_guard(*shared_context.output_lock);
cluster->region_cache->dropRegion(region_ver_id);
throw RegionException(
std::move(unavailable_regions),
RegionException::RegionReadStatus::NOT_FOUND,
std::to_string(reader_plan.region_id).c_str());
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Check ParseFromString return value to avoid undefined behavior on malformed data.

If the FFI layer returns malformed protobuf data, ParseFromString returns false and leaves the message object in an undefined state. Accessing fields like has_epoch_not_match() or current_regions() would then produce undefined behavior.

Proposed fix
     if (columnar_reader.error_type == ColumnarReaderErrorType::RegionError)
     {
         auto error_msg = String(columnar_reader.error.buff.data, columnar_reader.error.buff.len);
         errorpb::Error region_error;
-        region_error.ParseFromString(error_msg);
+        if (!region_error.ParseFromString(error_msg))
+        {
+            LOG_WARNING(log, "failed to parse region error protobuf, region_id={}", reader_plan.region_id);
+            throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "failed to parse region error for region {}", reader_plan.region_id);
+        }
         auto region_ver_id
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/Columnar/ColumnarReader.cpp` around lines 540 - 599, The
code calls region_error.ParseFromString(error_msg) but does not check its return
value before accessing fields like has_epoch_not_match() and current_regions(),
which causes undefined behavior if the protobuf data is malformed. Capture the
boolean return value of the ParseFromString call in the RegionError parsing
section and add a check to verify the parse was successful before proceeding to
access any fields on the region_error object. If parsing fails, log an
appropriate error and throw a RegionException instead of continuing with
undefined state.

Comment on lines +600 to +613
else if (columnar_reader.error_type == ColumnarReaderErrorType::LockedError)
{
auto error_msg = String(columnar_reader.error.buff.data, columnar_reader.error.buff.len);
kvrpcpb::LockInfo lock_info;
lock_info.ParseFromString(error_msg);
// Try to resolve locks.
pingcap::kv::Backoffer bo(pingcap::kv::copNextMaxBackoff);
std::vector<uint64_t> pushed;
std::vector<pingcap::kv::LockPtr> locks{makeLockForDisaggResolve(lock_info)};
auto guard = std::lock_guard(*shared_context.output_lock);
auto before_expired = cluster->lock_resolver->resolveLocks(bo, shared_context.start_ts, locks, pushed);
LOG_WARNING(log, "Finished resolve locks, before_expired={}", before_expired);
throw Exception("lock error", ErrorCodes::COLUMNAR_SNAPSHOT_ERROR);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Check ParseFromString return value for lock info.

Same issue as the region error case - parsing could fail on malformed data.

Proposed fix
     else if (columnar_reader.error_type == ColumnarReaderErrorType::LockedError)
     {
         auto error_msg = String(columnar_reader.error.buff.data, columnar_reader.error.buff.len);
         kvrpcpb::LockInfo lock_info;
-        lock_info.ParseFromString(error_msg);
+        if (!lock_info.ParseFromString(error_msg))
+        {
+            LOG_WARNING(log, "failed to parse lock info protobuf, region_id={}", reader_plan.region_id);
+            throw Exception(ErrorCodes::COLUMNAR_SNAPSHOT_ERROR, "failed to parse lock info for region {}", reader_plan.region_id);
+        }
         // Try to resolve locks.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/Columnar/ColumnarReader.cpp` around lines 600 - 613, The
ParseFromString call on lock_info is not checking for parsing failures when
deserializing the error message data. In the LockedError handling block within
the ColumnarReader code, verify that lock_info.ParseFromString(error_msg)
returns successfully before proceeding with lock resolution. If parsing fails,
log an appropriate error and throw an exception or handle the malformed data
case to prevent using an invalid lock_info object in the subsequent resolveLocks
call.

Comment on lines +136 to +143
try
{
clear_shared_snap_access_by_start_ts(start_ts, proxy_ptr);
}
catch (...)
{
LOG_WARNING(log, "clear shared snapaccess cache failed, start_ts={}", start_ts);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use tryLogCurrentException in catch block per coding guidelines.

The broad catch (...) block should use tryLogCurrentException to capture exception details for debugging, rather than a simple log message.

Proposed fix
         try
         {
             clear_shared_snap_access_by_start_ts(start_ts, proxy_ptr);
         }
         catch (...)
         {
-            LOG_WARNING(log, "clear shared snapaccess cache failed, start_ts={}", start_ts);
+            tryLogCurrentException(log, fmt::format("clear shared snapaccess cache failed, start_ts={}", start_ts));
         }

Based on coding guidelines: "Use tryLogCurrentException(log, "context") in broad catch (...) paths to avoid duplicated exception-formatting code"

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/Columnar/ColumnarReader.h` around lines 136 - 143, The
catch block in the clear_shared_snap_access_by_start_ts function call is using a
basic LOG_WARNING call instead of the recommended tryLogCurrentException
function to handle exceptions. Replace the LOG_WARNING statement with a
tryLogCurrentException call that includes a descriptive context message about
the failure to clear the shared snapshot access cache. This will ensure proper
exception details are captured for debugging according to the coding guidelines.

Source: Coding guidelines

Comment thread dbms/src/Storages/StorageDisaggregatedColumnar.cpp
The rebase onto upstream/master left old code (RNColumnarReaderSharedContext,
getRNColumnarSourceNum, and commented-out Columnar type implementations) in
StorageDisaggregatedColumnar.cpp. This file should only contain the
StorageDisaggregated integration glue methods (readThroughColumnar,
filterConditionsWithPushedDownFilters) after the refactoring.

Restore the correct stripped-down version.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
dbms/src/Storages/StorageDisaggregatedColumnar.cpp (1)

124-147: 🩺 Stability & Availability | 🟠 Major

Add guard before pipeline.firstStream() to match the pipeline overload pattern.

The first readThroughColumnar overload (lines 110–147) unconditionally calls pipeline.firstStream()->getHeader() at line 133. However, buildWithBackoff() can return an empty columnar_task_pools vector (when all_reader_plans.empty() at line 1033 in ColumnarReader.cpp), which leaves pipeline.streams empty. Calling firstStream() on an empty pipeline uses .at(0) internally and will throw std::out_of_range.

The second readThroughColumnar overload correctly guards this with if (!columnar_task_pools.empty()) at line 167. Add the same guard to the first overload before accessing the pipeline.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp` around lines 124 - 147,
The first readThroughColumnar overload calls pipeline.firstStream()->getHeader()
without checking if the pipeline is empty, which will throw std::out_of_range
when columnar_task_pools is empty. Add a guard check if
(!columnar_task_pools.empty()) wrapping the block that accesses the pipeline
header (from retrieving the stream header and extracting source columns through
to the DAGExpressionAnalyzer construction), matching the same guard pattern
already present in the second readThroughColumnar overload at line 167.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@dbms/src/Storages/StorageDisaggregatedColumnar.cpp`:
- Around line 124-147: The first readThroughColumnar overload calls
pipeline.firstStream()->getHeader() without checking if the pipeline is empty,
which will throw std::out_of_range when columnar_task_pools is empty. Add a
guard check if (!columnar_task_pools.empty()) wrapping the block that accesses
the pipeline header (from retrieving the stream header and extracting source
columns through to the DAGExpressionAnalyzer construction), matching the same
guard pattern already present in the second readThroughColumnar overload at line
167.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b7dccf38-fe86-494e-ae9e-c9790298251a

📥 Commits

Reviewing files that changed from the base of the PR and between 6798c2f and 77635e4.

📒 Files selected for processing (5)
  • dbms/src/Storages/Columnar/ColumnarInputStream.cpp
  • dbms/src/Storages/Columnar/ColumnarInputStream.h
  • dbms/src/Storages/Columnar/ColumnarSourceOp.cpp
  • dbms/src/Storages/Columnar/ColumnarSourceOp.h
  • dbms/src/Storages/StorageDisaggregatedColumnar.cpp
🚧 Files skipped from review as they are similar to previous changes (4)
  • dbms/src/Storages/Columnar/ColumnarSourceOp.cpp
  • dbms/src/Storages/Columnar/ColumnarSourceOp.h
  • dbms/src/Storages/Columnar/ColumnarInputStream.h
  • dbms/src/Storages/Columnar/ColumnarInputStream.cpp

@JaySon-Huang

Copy link
Copy Markdown
Contributor Author

/retest

Signed-off-by: JaySon-Huang <tshent@qq.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dbms/src/Storages/Columnar/tests/gtest_region_reader_plan.cpp`:
- Around line 266-271: The test in gtest_region_reader_plan should guard indexed
access to region_reader_plans by asserting the vector size before using
region_reader_plans[0]. Update the assertions around
buildColumnarRegionReaderPlans so that planned_reader_num and the actual
region_reader_plans container size are checked consistently, then only access
the first element after the size check. Use the buildColumnarRegionReaderPlans
output and the region_reader_plans field to locate and fix the unsafe access.
- Around line 299-317: The region-plan test only checks the two expected region
IDs and can still pass if extra plans are returned, so tighten validation in
gtest_region_reader_plan.cpp. In the loop over output.region_reader_plans, keep
the existing checks for region_id 1000 and 2000, but add a fail-fast assertion
for any other region_id, and assert that output.region_reader_plans has exactly
the expected number of entries before or after the loop. Use the existing
region_reader_plans, bucket_split_region_count, and region_id checks to locate
and update the validation logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 4aa282af-11a9-4cd3-a5f3-dbb32028b265

📥 Commits

Reviewing files that changed from the base of the PR and between 77635e4 and a32eb92.

📒 Files selected for processing (2)
  • .gitignore
  • dbms/src/Storages/Columnar/tests/gtest_region_reader_plan.cpp

Comment on lines +266 to +271
const auto output = buildColumnarRegionReaderPlans(*context, all_remote_regions_by_region, region_ver_ids, true);

ASSERT_EQ(output.planned_reader_num, 1U);
ASSERT_EQ(output.total_split_bucket_num, 0U);
EXPECT_TRUE(output.region_reader_plans[0].bucket_units.empty());
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Assert plan vector size before indexed access.

Line 270 indexes region_reader_plans[0] without an ASSERT_EQ(size, 1U) guard, so regressions can become undefined access instead of a clean test failure.

Suggested patch
     const auto output = buildColumnarRegionReaderPlans(*context, all_remote_regions_by_region, region_ver_ids, true);

     ASSERT_EQ(output.planned_reader_num, 1U);
     ASSERT_EQ(output.total_split_bucket_num, 0U);
+    ASSERT_EQ(output.region_reader_plans.size(), 1U);
     EXPECT_TRUE(output.region_reader_plans[0].bucket_units.empty());
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const auto output = buildColumnarRegionReaderPlans(*context, all_remote_regions_by_region, region_ver_ids, true);
ASSERT_EQ(output.planned_reader_num, 1U);
ASSERT_EQ(output.total_split_bucket_num, 0U);
EXPECT_TRUE(output.region_reader_plans[0].bucket_units.empty());
}
const auto output = buildColumnarRegionReaderPlans(*context, all_remote_regions_by_region, region_ver_ids, true);
ASSERT_EQ(output.planned_reader_num, 1U);
ASSERT_EQ(output.total_split_bucket_num, 0U);
ASSERT_EQ(output.region_reader_plans.size(), 1U);
EXPECT_TRUE(output.region_reader_plans[0].bucket_units.empty());
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/Columnar/tests/gtest_region_reader_plan.cpp` around lines
266 - 271, The test in gtest_region_reader_plan should guard indexed access to
region_reader_plans by asserting the vector size before using
region_reader_plans[0]. Update the assertions around
buildColumnarRegionReaderPlans so that planned_reader_num and the actual
region_reader_plans container size are checked consistently, then only access
the first element after the size check. Use the buildColumnarRegionReaderPlans
output and the region_reader_plans field to locate and fix the unsafe access.

Comment on lines +299 to +317
ASSERT_EQ(output.planned_reader_num, 4U);
ASSERT_EQ(output.total_split_bucket_num, 3U);

size_t bucket_split_region_count = 0;
for (const auto & plan : output.region_reader_plans)
{
if (plan.region_id == 1000)
{
ASSERT_EQ(plan.bucket_units.size(), 3U);
++bucket_split_region_count;
}
else if (plan.region_id == 2000)
{
EXPECT_TRUE(plan.bucket_units.empty());
++bucket_split_region_count;
}
}
EXPECT_EQ(bucket_split_region_count, 2U);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Make region-plan validation strict for unexpected IDs.

This test can still pass if an extra unexpected region plan is returned. Add a size assertion and fail fast on unknown IDs.

Suggested patch
     ASSERT_EQ(output.planned_reader_num, 4U);
     ASSERT_EQ(output.total_split_bucket_num, 3U);
+    ASSERT_EQ(output.region_reader_plans.size(), 2U);

     size_t bucket_split_region_count = 0;
     for (const auto & plan : output.region_reader_plans)
     {
         if (plan.region_id == 1000)
@@
         else if (plan.region_id == 2000)
         {
             EXPECT_TRUE(plan.bucket_units.empty());
             ++bucket_split_region_count;
         }
+        else
+        {
+            FAIL() << "Unexpected region_id in region_reader_plans: " << plan.region_id;
+        }
     }
     EXPECT_EQ(bucket_split_region_count, 2U);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ASSERT_EQ(output.planned_reader_num, 4U);
ASSERT_EQ(output.total_split_bucket_num, 3U);
size_t bucket_split_region_count = 0;
for (const auto & plan : output.region_reader_plans)
{
if (plan.region_id == 1000)
{
ASSERT_EQ(plan.bucket_units.size(), 3U);
++bucket_split_region_count;
}
else if (plan.region_id == 2000)
{
EXPECT_TRUE(plan.bucket_units.empty());
++bucket_split_region_count;
}
}
EXPECT_EQ(bucket_split_region_count, 2U);
}
ASSERT_EQ(output.planned_reader_num, 4U);
ASSERT_EQ(output.total_split_bucket_num, 3U);
ASSERT_EQ(output.region_reader_plans.size(), 2U);
size_t bucket_split_region_count = 0;
for (const auto & plan : output.region_reader_plans)
{
if (plan.region_id == 1000)
{
ASSERT_EQ(plan.bucket_units.size(), 3U);
+bucket_split_region_count;
}
else if (plan.region_id == 2000)
{
EXPECT_TRUE(plan.bucket_units.empty());
+bucket_split_region_count;
}
else
{
FAIL() << "Unexpected region_id in region_reader_plans: " << plan.region_id;
}
}
EXPECT_EQ(bucket_split_region_count, 2U);
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/Columnar/tests/gtest_region_reader_plan.cpp` around lines
299 - 317, The region-plan test only checks the two expected region IDs and can
still pass if extra plans are returned, so tighten validation in
gtest_region_reader_plan.cpp. In the loop over output.region_reader_plans, keep
the existing checks for region_id 1000 and 2000, but add a fail-fast assertion
for any other region_id, and assert that output.region_reader_plans has exactly
the expected number of entries before or after the loop. Use the existing
region_reader_plans, bucket_split_region_count, and region_id checks to locate
and update the validation logic.

Signed-off-by: JaySon-Huang <tshent@qq.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
dbms/src/Storages/Columnar/ColumnarReader.h (1)

295-329: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Align the test-visible API with the style rules. The new BucketSplitResult, ColumnarRegionReaderPlan, and ColumnarRegionReaderPlansOutput fields still use snake_case, size_t, and const String & bucket_key; rename them to camelCase, use UInt64 for the counters, and take bucketKey as std::string_view. Mirror the same renames in dbms/src/Storages/Columnar/ColumnarReader.cpp and dbms/src/Storages/Columnar/tests/gtest_region_reader_plan.cpp.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dbms/src/Storages/Columnar/ColumnarReader.h` around lines 295 - 329, The
test-visible API in ColumnarReader still uses nonconforming names and types;
update BucketSplitResult, ColumnarRegionReaderPlan, and
ColumnarRegionReaderPlansOutput to camelCase fields, switch the counter fields
from size_t to UInt64, and change isBucketBoundaryInsideRange to take bucketKey
as std::string_view. Propagate the same renames and type changes through
ColumnarReader.cpp and gtest_region_reader_plan.cpp, including all field
accesses and any helper signatures that consume these structs.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@dbms/src/Storages/Columnar/ColumnarReader.h`:
- Around line 295-329: The test-visible API in ColumnarReader still uses
nonconforming names and types; update BucketSplitResult,
ColumnarRegionReaderPlan, and ColumnarRegionReaderPlansOutput to camelCase
fields, switch the counter fields from size_t to UInt64, and change
isBucketBoundaryInsideRange to take bucketKey as std::string_view. Propagate the
same renames and type changes through ColumnarReader.cpp and
gtest_region_reader_plan.cpp, including all field accesses and any helper
signatures that consume these structs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 8c1ead0e-45c9-4f35-ae05-a0e3a343870f

📥 Commits

Reviewing files that changed from the base of the PR and between a32eb92 and 5b07925.

📒 Files selected for processing (4)
  • dbms/src/Common/FailPoint.cpp
  • dbms/src/Storages/Columnar/ColumnarReader.cpp
  • dbms/src/Storages/Columnar/ColumnarReader.h
  • dbms/src/Storages/Columnar/tests/gtest_region_reader_plan.cpp
🚧 Files skipped from review as they are similar to previous changes (2)
  • dbms/src/Storages/Columnar/tests/gtest_region_reader_plan.cpp
  • dbms/src/Storages/Columnar/ColumnarReader.cpp

@ti-chi-bot

ti-chi-bot Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

@JaySon-Huang: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-sanitizer-tsan 5b07925 link false /test pull-sanitizer-tsan
pull-sanitizer-asan 5b07925 link false /test pull-sanitizer-asan
pull-integration-next-gen-columnar 5b07925 link true /test pull-integration-next-gen-columnar
pull-integration-test 5b07925 link true /test pull-integration-test
pull-unit-test 5b07925 link true /test pull-unit-test
pull-unit-next-gen 5b07925 link true /test pull-unit-next-gen
pull-integration-next-gen 5b07925 link true /test pull-integration-next-gen

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant