Skip to content

[STF] Fix multi-context parallel for for grid places#9604

Open
caugonnet wants to merge 4 commits into
NVIDIA:mainfrom
caugonnet:cudax_green_context_streams
Open

[STF] Fix multi-context parallel for for grid places#9604
caugonnet wants to merge 4 commits into
NVIDIA:mainfrom
caugonnet:cudax_green_context_streams

Conversation

@caugonnet

@caugonnet caugonnet commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Description

This fixes a bug where we did not activate contexts properly when doing a parallel for over a grid of places (leading to a bug on multi-gpu systems (See #9590)

closes

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@copy-pr-bot

copy-pr-bot Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@caugonnet caugonnet self-assigned this Jun 26, 2026
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Progress in CCCL Jun 26, 2026
@caugonnet caugonnet added the stf Sequential Task Flow programming model label Jun 26, 2026
@caugonnet caugonnet marked this pull request as ready for review June 26, 2026 07:24
@caugonnet caugonnet requested a review from a team as a code owner June 26, 2026 07:24
@caugonnet caugonnet requested a review from andralex June 26, 2026 07:24
@cccl-authenticator-app cccl-authenticator-app Bot moved this from In Progress to In Review in CCCL Jun 26, 2026
@caugonnet

Copy link
Copy Markdown
Contributor Author

/ok to test 3f97ea6

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c80b2602-b384-43e7-94e4-00ee7fdfc72d

📥 Commits

Reviewing files that changed from the base of the PR and between 8756dfc and 3f97ea6.

📒 Files selected for processing (2)
  • cudax/include/cuda/experimental/__stf/internal/parallel_for_scope.cuh
  • cudax/test/stf/green_context/cuda_graph.cu

Note: CodeRabbit is enabled on this repository as a convenience for maintainers
and contributors. Use your best judgment when considering its review comments and
suggestions — a suggested change may be inadequate, unnecessary, or safe to ignore.
Contributors are not expected to address every comment. Human reviews are what
ultimately matter for merging.

Summary

  • Fixed grid-style parallel_for execution in STF/CUDAX by threading a per-place index through device execution and using t.get_stream(place_index) so each place uses the correct stream.
  • Updated the green-context CUDA graph test to instantiate one helper per GPU device and exercise all detected devices during the test run.

Testing

  • Expanded coverage in cudax/test/stf/green_context/cuda_graph.cu to run across all devices.

Walkthrough

important: parallel_for_scope now threads a place index through device execution and uses it to choose streams for stream_ctx. The green-context CUDA graph test now creates one helper per device and submits axpy tasks with the matching helper and view.

Changes

important: Per-place stream selection

Layer / File(s) Summary
Place index propagation
cudax/include/cuda/experimental/__stf/internal/parallel_for_scope.cuh
The multi-place branch passes the current partition index into do_parallel_for, the recursive path forwards it, and stream_ctx launches use t.get_stream(place_index).
Per-device green-context test
cudax/test/stf/green_context/cuda_graph.cu
cuda_graph.cu includes <vector>, stores green_context_helper objects per device, and iterates by iteration and device to select the matching green context view before launching axpy.

Comment @coderabbitai help to get the list of available commands.

@github-actions

Copy link
Copy Markdown
Contributor

🥳 CI Workflow Results

🟩 Finished in 1h 29m: Pass: 100%/55 | Total: 1d 13h | Max: 1h 29m | Hits: 5%/313114

See results here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stf Sequential Task Flow programming model

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

1 participant