Skip to content

fix(warp_reduce): add explicit add_op overload to resolve CUB template ambiguity on CUDA 13.2+#3050

Open
zbrad wants to merge 2 commits into
NVIDIA:mainfrom
zbrad:fix/warp-reduce-cub-ambiguity
Open

fix(warp_reduce): add explicit add_op overload to resolve CUB template ambiguity on CUDA 13.2+#3050
zbrad wants to merge 2 commits into
NVIDIA:mainfrom
zbrad:fix/warp-reduce-cub-ambiguity

Conversation

@zbrad

@zbrad zbrad commented Jun 8, 2026

Copy link
Copy Markdown

Summary

On CUDA 13.2 (SM 121, DGX Spark), IVF-PQ builds fail with an ambiguous template instantiation error. When CUB scan kernels call warpReduce(val, raft::add_op{}), both raft::warpReduce<T, ReduceLambda> and cub::detail::scan::warpReduce<Tp, ScanOpT&> match, producing a compile error.

Fix: Add an explicit non-template overload in cpp/include/raft/util/reduction.cuh:

template <typename T>
DI T warpReduce(T val, raft::add_op reduce_op)

The explicit overload is preferred by the compiler over the generic ReduceLambda overload, resolving the ambiguity without changing any existing behavior.

  • cpp/include/raft/util/reduction.cuh — explicit raft::add_op overload for warpReduce
  • cpp/tests/util/reduction.cu — regression test (WARP_REDUCE_WITH_ADD_OP) to prevent future regressions

Repro / context

Observed on DGX Spark (SM 121) with CUDA 13.2. The upstream CI does not test CUDA 13.2 / SM 121; the new test compiles and passes on all CUDA versions but will catch regressions if CUDA 13.2 support is added to CI.

Error seen without fix:

error: more than one instance of overloaded function "warpReduce" matches the argument list

Test plan

  • WARP_REDUCE_WITH_ADD_OP regression test added in cpp/tests/util/reduction.cu
  • Verified fix compiles and tests pass on CUDA 13.2 (SM 121, DGX Spark)
  • No changes to existing warpReduce behavior — explicit overload only activates for raft::add_op

🤖 Generated with Claude Code

…e ambiguity on CUDA 13.2

On CUDA 13.2 (SM 121, DGX Spark), IVF-PQ builds fail because both
raft::warpReduce<T, ReduceLambda> and cub::detail::scan::warpReduce<Tp, ScanOpT&>
match when called with raft::add_op{}, causing an ambiguous template instantiation.

Add an explicit non-template overload DI T warpReduce(T val, raft::add_op reduce_op)
in reduction.cuh. The explicit overload is preferred by the compiler, resolving ambiguity.

Also added a regression test WARP_REDUCE_WITH_ADD_OP to prevent future regressions.
@copy-pr-bot

copy-pr-bot Bot commented Jun 8, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@achirkin achirkin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for reporting the issue! Could you please provide the reference/example where the bug is triggered? Normally, I'd assume the fix should be on the user side - just call the function with explicit namespace.

{
assert(gridDim.x == 1);
int th_val = input[threadIdx.x];
th_val = raft::warpReduce(th_val, raft::add_op{});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really cause an ambiguity without the extra overload in util/reduction.cuh? It's called here with raft namespace, so I doubt CUB overload is ever picked up here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It showed up for me when trying to do a full source rebuild of cuvs on the dgx spark. When finally debugging my build failure, it traced down to raft, but only showed up when building for arm64.

@zbrad zbrad Jun 8, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I have the cuvs regression test for it that I'm submitting to cuvs, it's at cpp/tests/regression/warp_reduce_add_op.cu

@divyegala

Copy link
Copy Markdown
Contributor

@zbrad can you provide steps to reproduce this bug? I have a DGX Spark, and I have been building CUDA 13.2 without any failures.

@zbrad

zbrad commented Jun 8, 2026

Copy link
Copy Markdown
Author

some other folks had asked for more background, so I went back and re-created the original failure from compiling cuvs. I've attached the doc and the repro.
warp reduce ambiguity doc
repro

@achirkin

achirkin commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Thanks for the reproducer documentation! I feel uneasy about both suggested workarounds:

  • raft extra overload: it doesn't protect us from someone else failing exactly the same way on another device operation.
  • cuvs swap to thrust::plus: because we have a best practice guideline to use the raft operations where possible.

Maybe we could make the raft's warpReduce template itself a bit more restrictive so it wouldn't get in the way of thrust+cub primitives?..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants