Skip to content

Add cuda::bit_msb to <cuda/bit>#9624

Open
temujinkz wants to merge 1 commit into
NVIDIA:mainfrom
temujinkz:cuda-bit-msb
Open

Add cuda::bit_msb to <cuda/bit>#9624
temujinkz wants to merge 1 commit into
NVIDIA:mainfrom
temujinkz:cuda-bit-msb

Conversation

@temujinkz

Copy link
Copy Markdown

Addresses the msb request from #6108 (sibling of bit_ffs in #9609).

bit_msb returns the zero-based index of the most significant set bit, floor(log2(value)), and -1 for zero. It forwards to cuda::std::__bit_log2 so it reuses the existing bfind/clz lowering rather than adding new codegen.

I went with the zero-based convention to match __bit_log2, since a 1-based version would just be bit_width. Easy to flip to 1-based if you'd prefer. Tests and a docs page included.

bit_msb returns the zero-based index of the most significant set bit
(floor(log2(value))), or -1 when value is zero. It forwards to
cuda::std::__bit_log2 so it reuses the optimal bfind/clz lowering, and
exposes that internal primitive type safely on all unsigned integers.

Addresses the msb request in NVIDIA#6108.

Signed-off-by: temujinkz <ttalkenov@gmail.com>
@temujinkz temujinkz requested review from a team as code owners June 29, 2026 09:20
@temujinkz temujinkz requested a review from alliepiper June 29, 2026 09:20
@temujinkz temujinkz requested a review from davebayer June 29, 2026 09:20
@github-project-automation github-project-automation Bot moved this to Todo in CCCL Jun 29, 2026
@copy-pr-bot

copy-pr-bot Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Jun 29, 2026
@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note: CodeRabbit is enabled on this repository as a convenience for maintainers
and contributors. Use your best judgment when considering its review comments and
suggestions — a suggested change may be inadequate, unnecessary, or safe to ignore.
Contributors are not expected to address every comment. Human reviews are what
ultimately matter for merging.

Added cuda::bit_msb to <cuda/bit> as a thin, constexpr/noexcept wrapper over cuda::std::__bit_log2, returning the zero-based most-significant-set-bit index and -1 for zero.

Also included:

  • new public header and umbrella include wiring
  • documentation updates plus a dedicated API page
  • tests covering unsigned-type acceptance, invalid-type rejection, zero/single-bit/multi-bit behavior, and wide integer support

Walkthrough

Adds cuda::bit_msb, a constexpr function returning the zero-based index of the most significant set bit (or -1 for zero input). Implementation is a thin wrapper over ::cuda::std::__bit_log2 in a new internal header, wired into cuda/bit, with tests and docs.

cuda::bit_msb

Layer / File(s) Summary
Header implementation and wiring
libcudacxx/include/cuda/__bit/bit_msb.h, libcudacxx/include/cuda/bit
bit_msb template constrained to unsigned integers, forwarding to __bit_log2; included in the cuda/bit umbrella header.
Tests
libcudacxx/test/libcudacxx/cuda/bit/bit_msb.pass.cpp
Concept-based type-rejection checks, constexpr and runtime value checks for zero, single-bit, and multi-bit inputs across all unsigned types.
Docs
docs/libcudacxx/extended_api/bit.rst, docs/libcudacxx/extended_api/bit/bit_msb.rst
New bit_msb.rst page with signature, constraints, behavior, and example; toctree and capability table updated in bit.rst.

suggestion: bit_msb returning int for an unsigned-input function is a signed/unsigned mismatch at the call site for callers using the result in bit-shift or array-index contexts. Consider unsigned int or a named type alias so the -1 sentinel is explicit (e.g., an optional<unsigned> or a named sentinel constant), matching what bit_width returns.

suggestion: The doc says the return is equivalent to bit_width(value) - 1 for non-zero inputs, but bit_width returns unsigned, while bit_msb returns int. The docs should explicitly note this type difference to avoid confusion.


Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 172f9c7a-18ac-49a7-bd73-6af7368b9912

📥 Commits

Reviewing files that changed from the base of the PR and between 0d8f162 and 6d37a1f.

📒 Files selected for processing (5)
  • docs/libcudacxx/extended_api/bit.rst
  • docs/libcudacxx/extended_api/bit/bit_msb.rst
  • libcudacxx/include/cuda/__bit/bit_msb.h
  • libcudacxx/include/cuda/bit
  • libcudacxx/test/libcudacxx/cuda/bit/bit_msb.pass.cpp

Comment on lines +41 to +55
#include <cuda/bit>
#include <cuda/std/cassert>
#include <cuda/std/cstdint>

__global__ void bit_msb_kernel() {
assert(cuda::bit_msb(uint32_t{0}) == -1);
assert(cuda::bit_msb(uint32_t{1}) == 0);
assert(cuda::bit_msb(uint32_t{0b10101000}) == 7);
assert(cuda::bit_msb(~uint32_t{0}) == 31);
}

int main() {
bit_msb_kernel<<<1, 1>>>();
cudaDeviceSynchronize();
return 0;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

important: Make the example self-contained.

The snippet calls cudaDeviceSynchronize() but never includes a CUDA runtime header, so it is not buildable as written. Add the appropriate runtime include (for example, the project-preferred CUDA runtime API header) or drop the synchronization call from the example. As per path instructions, documentation changes should prioritize buildable examples and technical accuracy.

Source: Path instructions

Comment on lines +37 to +39
[[nodiscard]] _CCCL_API constexpr int bit_msb(_Tp __value) noexcept
{
return static_cast<int>(::cuda::std::__bit_log2(__value));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '%s\n' "== file outline =="
ast-grep outline libcudacxx/include/cuda/__bit/bit_msb.h --view expanded || true

printf '\n%s\n' "== relevant file excerpt =="
sed -n '1,120p' libcudacxx/include/cuda/__bit/bit_msb.h | cat -n

printf '\n%s\n' "== related integral helper excerpt =="
sed -n '1,140p' libcudacxx/include/cuda/std/__bit/integral.h | cat -n

Repository: NVIDIA/cccl

Length of output: 8310


🏁 Script executed:

#!/bin/bash
set -euo pipefail

python3 - <<'PY'
# Read-only semantic probe for the reviewed cast.
# This does not execute repository code; it just models the relevant C++ rule.
import ctypes

u32 = 0xFFFFFFFF
# On common platforms, casting to a signed 32-bit int yields -1.
i32 = ctypes.c_int(u32).value
# On a hypothetical 64-bit int, the same source value would remain positive.
i64 = ctypes.c_longlong(u32).value

print("c_int(0xFFFFFFFF) =", i32)
print("c_longlong(0xFFFFFFFF) =", i64)
print("portable_negative_one_via_cast =", i32 == -1 and i64 == -1)
PY

Repository: NVIDIA/cccl

Length of output: 247


🏁 Script executed:

#!/bin/bash
set -euo pipefail

python3 - <<'PY'
# Demonstrate the portability issue in standard C++ terms.
# If an unsigned value is not representable in the target signed type,
# the result of the conversion is implementation-defined.
print("static_cast<int>(0xFFFFFFFFu) is not guaranteed to produce -1 on every C++ implementation.")
PY

Repository: NVIDIA/cccl

Length of output: 240


important: avoid relying on the implementation-defined cast for the zero case. __bit_log2(0) returns 0xFFFFFFFF, so static_cast<int>(...) only produces -1 when that unsigned-to-signed conversion happens to line up with the target int width. A direct __value == 0 ? -1 : ... path makes the contract portable.

Source: Path instructions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

1 participant