Skip to content

Cuda Memcpy Async Tail Chunk Overrun#9602

Open
fbusato wants to merge 1 commit into
NVIDIA:mainfrom
fbusato:fix-memcpy-check-precondition
Open

Cuda Memcpy Async Tail Chunk Overrun#9602
fbusato wants to merge 1 commit into
NVIDIA:mainfrom
fbusato:fix-memcpy-check-precondition

Conversation

@fbusato

@fbusato fbusato commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Description

memcpy_async preconditions:

The precondition check should reject sizes that do not match the backend chunking rules. Instead, it only checks alignment and overlap, then forwards the byte count unchanged. Device implementations copy fixed 2/4/8/16-byte chunks while offset < size, so a short tail is copied past the requested extent.

and added a couple of improvements

@fbusato fbusato self-assigned this Jun 26, 2026
@fbusato fbusato added the libcu++ For all items related to libcu++ label Jun 26, 2026
@fbusato fbusato requested a review from a team as a code owner June 26, 2026 01:17
@fbusato fbusato added the bug label Jun 26, 2026
@fbusato fbusato added this to CCCL Jun 26, 2026
@fbusato fbusato requested a review from wmaxey June 26, 2026 01:17
@github-project-automation github-project-automation Bot moved this to Todo in CCCL Jun 26, 2026
@fbusato fbusato moved this from Todo to In Review in CCCL Jun 26, 2026
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9cc9c2a2-e97a-428a-8273-9275a8bcbd69

📥 Commits

Reviewing files that changed from the base of the PR and between 2ff1f4b and 0a0b1b9.

📒 Files selected for processing (1)
  • libcudacxx/include/cuda/__memcpy_async/check_preconditions.h

Note: CodeRabbit is enabled on this repository as a convenience for maintainers
and contributors. Use your best judgment when considering its review comments and
suggestions — a suggested change may be inadequate, unnecessary, or safe to ignore.
Contributors are not expected to address every comment. Human reviews are what
ultimately matter for merging.

Updated __memcpy_async_check_pre to better enforce valid copy sizes for backend chunking rules. The check now derives a chunk size from alignment (min(16, align)) and rejects byte counts that are not a multiple of that chunk. It also keeps source/destination alignment validation while switching the overlap test to ::cuda::ranges_overlap over the actual byte ranges.

Also adjusted the void* overload to forward the size into the updated typed implementation and added the new overlap header include.

Walkthrough

important: __memcpy_async_check_pre adds a copy-chunk divisibility check, keeps alignment checks, and replaces the overlap test with cuda::ranges_overlap; the void* overload forwards __size_bytes into the updated typed overload.

Changes

Memcpy async precondition checks

Layer / File(s) Summary
Precondition checks
libcudacxx/include/cuda/__memcpy_async/check_preconditions.h
Adds the memcpy_async copy-chunk multiple check, keeps alignment validation, switches the non-overlap test to cuda::ranges_overlap, and forwards __size_bytes through the void* overload.

Comment @coderabbitai help to get the list of available commands.

@github-actions

Copy link
Copy Markdown
Contributor

😬 CI Workflow Results

🟥 Finished in 2h 13m: Pass: 99%/120 | Total: 2d 04h | Max: 1h 06m | Hits: 89%/396554

See results here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug libcu++ For all items related to libcu++

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

1 participant