Skip to content

Add cuda::bit_ffs to <cuda/bit>#9609

Open
temujinkz wants to merge 3 commits into
NVIDIA:mainfrom
temujinkz:cuda-bit-ffs
Open

Add cuda::bit_ffs to <cuda/bit>#9609
temujinkz wants to merge 3 commits into
NVIDIA:mainfrom
temujinkz:cuda-bit-ffs

Conversation

@temujinkz

Copy link
Copy Markdown

cuda::bit_ffs(x) returns the 1-based position of the lowest set bit, or 0 if x is zero. The body is x == 0 ? 0 : countr_zero(x) + 1. countr_zero counts trailing zeros, so adding 1 turns it into a 1-based index. The x == 0 guard is the one case countr_zero doesn't define cleanly, and handling it is the whole reason this wrapper exists. It matches __builtin_ffs and CUDA's __ffs. It's constrained to unsigned integers, works on all widths including 128-bit, and is constexpr plus host/device because it just forwards to countr_zero.

Implements a type-safe find-first-set that returns 1 + the index of the
least significant set bit, or 0 for a zero input, matching __builtin_ffs
and CUDA's __ffs (unlike countr_zero). Adds tests and docs.

Closes NVIDIA#6108

Signed-off-by: temujinkz <ttalkenov@gmail.com>
@temujinkz temujinkz requested review from a team as code owners June 26, 2026 11:06
@temujinkz temujinkz requested a review from gonidelis June 26, 2026 11:06
@temujinkz temujinkz requested a review from griwes June 26, 2026 11:06
@github-project-automation github-project-automation Bot moved this to Todo in CCCL Jun 26, 2026
@copy-pr-bot

copy-pr-bot Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Jun 26, 2026
constexpr T all_ones = static_cast<T>(~T{0});

// a zero input is well defined and returns 0
static_assert(cuda::bit_ffs(T{0}) == 0);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: All those should only be

Suggested change
static_assert(cuda::bit_ffs(T{0}) == 0);
assert(cuda::bit_ffs(T{0}) == 0);

We are invoking the function in a constexpr context, so otherwise we would never test the runtime path

TEST_FUNC constexpr bool test()
{
using nl = cuda::std::numeric_limits<T>;
constexpr T all_ones = static_cast<T>(~T{0});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
constexpr T all_ones = static_cast<T>(~T{0});
[[maybe_unused]] constexpr T all_ones = static_cast<T>(~T{0});

Comment on lines +34 to +35

unused(all_ones);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unused(all_ones);

static_assert(cuda::bit_ffs(static_cast<T>(0b10101000)) == 4);

unused(all_ones);
return true;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: This function is only called from the test() function, so it does not need to return anything

#if _CCCL_HAS_INT128()
test<__uint128_t>();
#endif // _CCCL_HAS_INT128()
return true;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test or two that we cannot invoke the function with bool, a signed integer or float and an enumeration

something like:

template<typename T>
_CCCL_CONCEPT can_bit_ffs = _CCCL_REQUIRES_EXPR((T), T val)((cuda::bit_ffs(val));

@github-project-automation github-project-automation Bot moved this from In Review to In Progress in CCCL Jun 26, 2026
@miscco

miscco commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Thanks a lot for the PR, This already looks great, I have some nits

__global__ void bit_ffs_kernel() {
assert(cuda::bit_ffs(uint32_t{0}) == 0);
assert(cuda::bit_ffs(uint32_t{1}) == 1);
assert(cuda::bit_ffs(uint32_t{0b10101000}) == 4);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also add an example for all bit set

Comment thread libcudacxx/include/cuda/__bit/bit_ffs.h Outdated
_CCCL_REQUIRES(::cuda::std::__cccl_is_unsigned_integer_v<_Tp>)
[[nodiscard]] _CCCL_API constexpr int bit_ffs(_Tp __value) noexcept
{
return (__value == _Tp{0}) ? 0 : ::cuda::std::countr_zero(__value) + 1;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the implementation can be optimized. please make sure that the implementation produces the same code of __ffs and __ffsll https://godbolt.org/z/z4717fqGx

Review feedback from @miscco and @fbusato on NVIDIA#9609:
- route through __builtin_ffs / __builtin_ffsll on host and __ffs / __ffsll on
  device (constexpr fallback) so codegen matches __ffs / __ffsll
- test: use assert instead of static_assert so the runtime path is exercised,
  mark the per-type helper void, [[maybe_unused]] the constant
- test: add a concept check that bit_ffs rejects bool, signed, float, and enums
- docs: add an all-bits-set example

Signed-off-by: temujinkz <ttalkenov@gmail.com>
@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d5c33304-15a0-4d92-abdf-3b781df4d761

📥 Commits

Reviewing files that changed from the base of the PR and between bfb8bd1 and 9833415.

📒 Files selected for processing (1)
  • libcudacxx/test/libcudacxx/cuda/bit/bit_ffs.pass.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • libcudacxx/test/libcudacxx/cuda/bit/bit_ffs.pass.cpp

Note: CodeRabbit is enabled on this repository as a convenience for maintainers
and contributors. Use your best judgment when considering its review comments and
suggestions — a suggested change may be inadequate, unnecessary, or safe to ignore.
Contributors are not expected to address every comment. Human reviews are what
ultimately matter for merging.

Added cuda::bit_ffs to <cuda/bit>: a constexpr __host__ __device__ API that returns the 1-based position of the least significant set bit, or 0 when the input is zero, constrained to unsigned integer types (including 128-bit). The implementation uses __builtin_ffs/__builtin_ffsll on host and __ffs/__ffsll on device, with a countr_zero(x) + 1 constexpr fallback and explicit zero handling; device dispatch avoids unsupported asm in tile compilation.

Also:

  • Added the new public header (<cuda/__bit/bit_ffs.h>) and wired it through the <cuda/bit> umbrella include.
  • Added/updated documentation for cuda::bit_ffs and the extended Bit API index (including an all-bits-set example).
  • Added a pass test (bit_ffs.pass.cpp) with a concept/type constraint check (rejects bool, signed integers, floating point, and enums) and correctness assertions across unsigned types, using assert for runtime checks and direct includes (<cuda/std/limits> and <cuda/std/cstddef>) for include-hygiene compliance.

Walkthrough

suggestion: Adds cuda::bit_ffs implementation, transitive inclusion from cuda/bit, extended API documentation, and tests covering unsigned-only inputs plus zero, single-bit, and multi-bit results.

Changes

Bit FFS API

Layer / File(s) Summary
Documentation and index
docs/libcudacxx/extended_api/bit.rst, docs/libcudacxx/extended_api/bit/bit_ffs.rst
Adds the cuda::bit_ffs topic page and registers it in the extended API bit index and availability table.
API surface
libcudacxx/include/cuda/__bit/bit_ffs.h, libcudacxx/include/cuda/bit
Defines cuda::bit_ffs with constexpr, host, and device dispatch for unsigned types, and adds the transitive <cuda/__bit/bit_ffs.h> include from cuda/bit.
Unsigned-input coverage
libcudacxx/test/libcudacxx/cuda/bit/bit_ffs.pass.cpp
Checks unsigned-only constraints and bit_ffs behavior for zero, single-bit, and multi-bit inputs across several unsigned integer types.

Suggested labels

libcu++

Suggested reviewers

  • griwes
  • davebayer

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8617fc92-34c6-45a3-b206-feac28bfbaf5

📥 Commits

Reviewing files that changed from the base of the PR and between 8756dfc and bfb8bd1.

📒 Files selected for processing (5)
  • docs/libcudacxx/extended_api/bit.rst
  • docs/libcudacxx/extended_api/bit/bit_ffs.rst
  • libcudacxx/include/cuda/__bit/bit_ffs.h
  • libcudacxx/include/cuda/bit
  • libcudacxx/test/libcudacxx/cuda/bit/bit_ffs.pass.cpp

Comment thread libcudacxx/test/libcudacxx/cuda/bit/bit_ffs.pass.cpp
@temujinkz

Copy link
Copy Markdown
Author

Thanks for the fast review, really appreciate it. Pushed an update that should cover everything:

  • @fbusato: bit_ffs now goes through __builtin_ffs / __builtin_ffsll on host and __ffs / __ffsll on device, with a constexpr fallback. I mirrored the host/device/constexpr dispatch from countr_zero so it stays consistent and matches the ffs/ffsll codegen.
  • @miscco: swapped the in-test static_asserts for assert so the runtime path actually runs, made the per-type helper return void, and marked the constant [[maybe_unused]].
  • @miscco: added a concept check so bit_ffs rejects bool, signed integers, float, and enums.
  • @miscco: added the all-bits-set example to the docs.

Let me know if the codegen looks right on your side, happy to tweak anything.

temujinkz added a commit to temujinkz/cccl that referenced this pull request Jun 27, 2026
Pre-empt the same review feedback as bit_ffs (NVIDIA#9609):
- implement bit_msb by forwarding to cuda::std::bit_width, which already lowers to
  the optimal clz based code
- test: use assert instead of static_assert, per-type helper returns void,
  [[maybe_unused]] the constant, and add a concept check rejecting bool, signed,
  float, and enums
- docs: add an all-bits-set example

Signed-off-by: temujinkz <ttalkenov@gmail.com>
Per the IWYU coding guideline (and CodeRabbit), the test used numeric_limits and
size_t through transitive includes. Add the direct headers.

Signed-off-by: temujinkz <ttalkenov@gmail.com>
@temujinkz

Copy link
Copy Markdown
Author

Good catch, fixed. Added <cuda/std/limits> and <cuda/std/cstddef> directly to the test instead of leaning on transitive includes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

3 participants