Add cuda::bit_msb to <cuda/bit>#9624
Conversation
bit_msb returns the zero-based index of the most significant set bit (floor(log2(value))), or -1 when value is zero. It forwards to cuda::std::__bit_log2 so it reuses the optimal bfind/clz lowering, and exposes that internal primitive type safely on all unsigned integers. Addresses the msb request in NVIDIA#6108. Signed-off-by: temujinkz <ttalkenov@gmail.com>
Added Also included:
WalkthroughAdds cuda::bit_msb
suggestion: suggestion: The doc says the return is equivalent to Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 172f9c7a-18ac-49a7-bd73-6af7368b9912
📒 Files selected for processing (5)
docs/libcudacxx/extended_api/bit.rstdocs/libcudacxx/extended_api/bit/bit_msb.rstlibcudacxx/include/cuda/__bit/bit_msb.hlibcudacxx/include/cuda/bitlibcudacxx/test/libcudacxx/cuda/bit/bit_msb.pass.cpp
| #include <cuda/bit> | ||
| #include <cuda/std/cassert> | ||
| #include <cuda/std/cstdint> | ||
|
|
||
| __global__ void bit_msb_kernel() { | ||
| assert(cuda::bit_msb(uint32_t{0}) == -1); | ||
| assert(cuda::bit_msb(uint32_t{1}) == 0); | ||
| assert(cuda::bit_msb(uint32_t{0b10101000}) == 7); | ||
| assert(cuda::bit_msb(~uint32_t{0}) == 31); | ||
| } | ||
|
|
||
| int main() { | ||
| bit_msb_kernel<<<1, 1>>>(); | ||
| cudaDeviceSynchronize(); | ||
| return 0; |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
important: Make the example self-contained.
The snippet calls cudaDeviceSynchronize() but never includes a CUDA runtime header, so it is not buildable as written. Add the appropriate runtime include (for example, the project-preferred CUDA runtime API header) or drop the synchronization call from the example. As per path instructions, documentation changes should prioritize buildable examples and technical accuracy.
Source: Path instructions
| [[nodiscard]] _CCCL_API constexpr int bit_msb(_Tp __value) noexcept | ||
| { | ||
| return static_cast<int>(::cuda::std::__bit_log2(__value)); |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '%s\n' "== file outline =="
ast-grep outline libcudacxx/include/cuda/__bit/bit_msb.h --view expanded || true
printf '\n%s\n' "== relevant file excerpt =="
sed -n '1,120p' libcudacxx/include/cuda/__bit/bit_msb.h | cat -n
printf '\n%s\n' "== related integral helper excerpt =="
sed -n '1,140p' libcudacxx/include/cuda/std/__bit/integral.h | cat -nRepository: NVIDIA/cccl
Length of output: 8310
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python3 - <<'PY'
# Read-only semantic probe for the reviewed cast.
# This does not execute repository code; it just models the relevant C++ rule.
import ctypes
u32 = 0xFFFFFFFF
# On common platforms, casting to a signed 32-bit int yields -1.
i32 = ctypes.c_int(u32).value
# On a hypothetical 64-bit int, the same source value would remain positive.
i64 = ctypes.c_longlong(u32).value
print("c_int(0xFFFFFFFF) =", i32)
print("c_longlong(0xFFFFFFFF) =", i64)
print("portable_negative_one_via_cast =", i32 == -1 and i64 == -1)
PYRepository: NVIDIA/cccl
Length of output: 247
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python3 - <<'PY'
# Demonstrate the portability issue in standard C++ terms.
# If an unsigned value is not representable in the target signed type,
# the result of the conversion is implementation-defined.
print("static_cast<int>(0xFFFFFFFFu) is not guaranteed to produce -1 on every C++ implementation.")
PYRepository: NVIDIA/cccl
Length of output: 240
important: avoid relying on the implementation-defined cast for the zero case. __bit_log2(0) returns 0xFFFFFFFF, so static_cast<int>(...) only produces -1 when that unsigned-to-signed conversion happens to line up with the target int width. A direct __value == 0 ? -1 : ... path makes the contract portable.
Source: Path instructions
Addresses the msb request from #6108 (sibling of
bit_ffsin #9609).bit_msbreturns the zero-based index of the most significant set bit,floor(log2(value)), and -1 for zero. It forwards tocuda::std::__bit_log2so it reuses the existing bfind/clz lowering rather than adding new codegen.I went with the zero-based convention to match
__bit_log2, since a 1-based version would just bebit_width. Easy to flip to 1-based if you'd prefer. Tests and a docs page included.