fix(warp_reduce): add explicit add_op overload to resolve CUB template ambiguity on CUDA 13.2+#3050
fix(warp_reduce): add explicit add_op overload to resolve CUB template ambiguity on CUDA 13.2+#3050zbrad wants to merge 2 commits into
Conversation
…e ambiguity on CUDA 13.2
On CUDA 13.2 (SM 121, DGX Spark), IVF-PQ builds fail because both
raft::warpReduce<T, ReduceLambda> and cub::detail::scan::warpReduce<Tp, ScanOpT&>
match when called with raft::add_op{}, causing an ambiguous template instantiation.
Add an explicit non-template overload DI T warpReduce(T val, raft::add_op reduce_op)
in reduction.cuh. The explicit overload is preferred by the compiler, resolving ambiguity.
Also added a regression test WARP_REDUCE_WITH_ADD_OP to prevent future regressions.
| { | ||
| assert(gridDim.x == 1); | ||
| int th_val = input[threadIdx.x]; | ||
| th_val = raft::warpReduce(th_val, raft::add_op{}); |
There was a problem hiding this comment.
Does this really cause an ambiguity without the extra overload in util/reduction.cuh? It's called here with raft namespace, so I doubt CUB overload is ever picked up here.
There was a problem hiding this comment.
It showed up for me when trying to do a full source rebuild of cuvs on the dgx spark. When finally debugging my build failure, it traced down to raft, but only showed up when building for arm64.
There was a problem hiding this comment.
Oh, I have the cuvs regression test for it that I'm submitting to cuvs, it's at cpp/tests/regression/warp_reduce_add_op.cu
|
@zbrad can you provide steps to reproduce this bug? I have a DGX Spark, and I have been building CUDA 13.2 without any failures. |
|
some other folks had asked for more background, so I went back and re-created the original failure from compiling cuvs. I've attached the doc and the repro. |
|
Thanks for the reproducer documentation! I feel uneasy about both suggested workarounds:
Maybe we could make the raft's warpReduce template itself a bit more restrictive so it wouldn't get in the way of thrust+cub primitives?.. |
Summary
On CUDA 13.2 (SM 121, DGX Spark), IVF-PQ builds fail with an ambiguous template instantiation error. When CUB scan kernels call
warpReduce(val, raft::add_op{}), bothraft::warpReduce<T, ReduceLambda>andcub::detail::scan::warpReduce<Tp, ScanOpT&>match, producing a compile error.Fix: Add an explicit non-template overload in
cpp/include/raft/util/reduction.cuh:The explicit overload is preferred by the compiler over the generic
ReduceLambdaoverload, resolving the ambiguity without changing any existing behavior.cpp/include/raft/util/reduction.cuh— explicitraft::add_opoverload forwarpReducecpp/tests/util/reduction.cu— regression test (WARP_REDUCE_WITH_ADD_OP) to prevent future regressionsRepro / context
Observed on DGX Spark (SM 121) with CUDA 13.2. The upstream CI does not test CUDA 13.2 / SM 121; the new test compiles and passes on all CUDA versions but will catch regressions if CUDA 13.2 support is added to CI.
Error seen without fix:
Test plan
WARP_REDUCE_WITH_ADD_OPregression test added incpp/tests/util/reduction.cuwarpReducebehavior — explicit overload only activates forraft::add_op🤖 Generated with Claude Code