Cuda Memcpy Async Tail Chunk Overrun by fbusato · Pull Request #9602 · NVIDIA/cccl

fbusato · 2026-06-26T01:17:24Z

Description

memcpy_async preconditions:

The precondition check should reject sizes that do not match the backend chunking rules. Instead, it only checks alignment and overlap, then forwards the byte count unchanged. Device implementations copy fixed 2/4/8/16-byte chunks while offset < size, so a short tail is copied past the requested extent.

and added a couple of improvements

coderabbitai · 2026-06-26T01:21:14Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9cc9c2a2-e97a-428a-8273-9275a8bcbd69

📥 Commits

Reviewing files that changed from the base of the PR and between 2ff1f4b and 0a0b1b9.

📒 Files selected for processing (1)

libcudacxx/include/cuda/__memcpy_async/check_preconditions.h

Note: CodeRabbit is enabled on this repository as a convenience for maintainers
and contributors. Use your best judgment when considering its review comments and
suggestions — a suggested change may be inadequate, unnecessary, or safe to ignore.
Contributors are not expected to address every comment. Human reviews are what
ultimately matter for merging.

Updated __memcpy_async_check_pre to better enforce valid copy sizes for backend chunking rules. The check now derives a chunk size from alignment (min(16, align)) and rejects byte counts that are not a multiple of that chunk. It also keeps source/destination alignment validation while switching the overlap test to ::cuda::ranges_overlap over the actual byte ranges.

Also adjusted the void* overload to forward the size into the updated typed implementation and added the new overlap header include.

Walkthrough

important: __memcpy_async_check_pre adds a copy-chunk divisibility check, keeps alignment checks, and replaces the overlap test with cuda::ranges_overlap; the void* overload forwards __size_bytes into the updated typed overload.

Changes

Memcpy async precondition checks

Layer / File(s)	Summary
Precondition checks `libcudacxx/include/cuda/__memcpy_async/check_preconditions.h`	Adds the memcpy_async copy-chunk multiple check, keeps alignment validation, switches the non-overlap test to `cuda::ranges_overlap`, and forwards `__size_bytes` through the `void*` overload.

_{Comment @coderabbitai help to get the list of available commands.}

github-actions · 2026-06-26T03:32:36Z

😬 CI Workflow Results

🟥 Finished in 2h 13m: Pass: 99%/120 | Total: 2d 04h | Max: 1h 06m | Hits: 89%/396554

See results here.

Cuda Memcpy Async Tail Chunk Overrun

0a0b1b9

fbusato self-assigned this Jun 26, 2026

fbusato added the libcu++ For all items related to libcu++ label Jun 26, 2026

fbusato requested a review from a team as a code owner June 26, 2026 01:17

fbusato added the bug label Jun 26, 2026

fbusato added this to CCCL Jun 26, 2026

fbusato requested a review from wmaxey June 26, 2026 01:17

github-project-automation Bot moved this to Todo in CCCL Jun 26, 2026

fbusato moved this from Todo to In Review in CCCL Jun 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cuda Memcpy Async Tail Chunk Overrun#9602

Cuda Memcpy Async Tail Chunk Overrun#9602
fbusato wants to merge 1 commit into
NVIDIA:mainfrom
fbusato:fix-memcpy-check-precondition

fbusato commented Jun 26, 2026

Uh oh!

coderabbitai Bot commented Jun 26, 2026

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

fbusato commented Jun 26, 2026

Description

Uh oh!

coderabbitai Bot commented Jun 26, 2026

Walkthrough

Changes

Uh oh!

github-actions Bot commented Jun 26, 2026

😬 CI Workflow Results

🟥 Finished in 2h 13m: Pass: 99%/120 | Total: 2d 04h | Max: 1h 06m | Hits: 89%/396554

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant