Skip to content

Add support for unsigned bit-fields in structs#154

Open
KimayaBedarkar wants to merge 5 commits into
mainfrom
kimaya/bitfields
Open

Add support for unsigned bit-fields in structs#154
KimayaBedarkar wants to merge 5 commits into
mainfrom
kimaya/bitfields

Conversation

@KimayaBedarkar

@KimayaBedarkar KimayaBedarkar commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Add support for unsigned bit-fields in structs

What this adds

Support for unsigned (and _Bool) bit-fields in C structs, e.g.

struct flags {
    unsigned int a : 3;      // value in [0, 8)
    unsigned int nibble : 4; // value in [0, 16)
    _Bool  ready : 1;        // 0 or 1
    int    normal;           // ordinary field, unchanged
};

Before this change PAL silently mistranslated these: the : N width was
dropped and each bit-field became a full-width, separately-addressable field.
That is wrong — C bit-fields are not addressable (&s.a is illegal) and
share storage units, so they do not fit PAL's per-field ref T + pts_to
model.

Encoding: refined machine cell + mask-on-write

Each unsigned bit-field is modeled as an ordinary scalar record slot whose
backing cell is the declared underlying machine type refined to the width:

unsigned int a : N   ==>   (v:UInt32.t{UInt32.v v < pow2 N})

(UInt8/16/64.t for the other underlying widths.)

  • Reads are conversion-free: the stored value already has the bit-field's
    logical unsigned type and is used directly (via subtyping) in returns, specs,
    and arithmetic.
  • Writes mask the RHS to the width — s->a = e stores
    Pulse.Lib.C.BitField.mask_uW N e — so the stored value satisfies the cell's
    < pow2 N refinement. This is exactly C unsigned modular truncation
    (e % 2^N), and the masked value's postcondition (mask == e % pow2 N) makes
    the truncation provable in user specs (_ensures(s->a == v % 8)).
  • Defaults are 0 (in range for every N >= 1).

Physical bit packing / storage sharing is intentionally not modeled: each
bit-field is an independent value. This is sound because a bit-field's backing
cell is never C-address-exposed (&s->a is illegal C and never appears in valid
input).

Why this design over FStar.UInt.uint_t n

The natural alternative is to store each bit-field as the parametric,
exactly-n-bit integer FStar.UInt.uint_t n (= x:int{0 <= x < pow2 n}).
We chose the refined machine cell instead, for three reasons:

  1. It is the faithful model of C. A C bit-field genuinely is an
    unsigned int (or unsigned char, ...) whose value happens to be
    range-limited. (v:UInt32.t{UInt32.v v < pow2 N}) says exactly that.
    uint_t n is a mathematical nat, a value type C has no notion of.

  2. Reads are conversion-free. Because the cell already is a UInt32.t, a
    read is directly usable wherever the field's unsigned int value is expected
    — returns, specifications, and integer-promoted arithmetic — with no cast.
    With uint_t n every read site would need a uint_to_t conversion back to a
    machine type, adding proof obligations and noise to generated code.

  3. Consistency with how PAL already handles unsigned overflow. PAL lowers
    unsigned +/-/* to width-selected, total library helpers
    (add_wrap/sub_wrap/mul_wrap) that keep values as machine ints and encode
    modular ("wrapping") semantics in their postconditions. The mask-on-write
    design mirrors this one-to-one: a width-selected total helper (mask_uW) that
    keeps the value a machine int and states its modular result (v % pow2 N) in
    the postcondition. uint_t n would introduce a value representation that
    nothing else in PAL uses.

Note: an earlier justification appealed to C extraction ergonomics. That
reasoning was dropped — PAL is a one-way C→Pulse transpiler with no
KaRaMeL/C-extraction path, so extraction is not a factor in this decision.

Why a refinement is still needed

There is no machine type for arbitrary widths (1, 3, 7, 31, ...); the smallest
is UInt8. A bare UInt32.t only bounds the value < pow2 32. The refinement
is what narrows a machine word down to the actual N bits, so it carries the
real invariant regardless of which storage width backs the field.

Why the mask uses rem, not logand

mask_uW truncates by modular arithmetic (UW.rem v (pow2 n)) rather than a
bitwise logand + logand_mask lemma. This is both simpler and more
consistent: the < pow2 n bound falls out of "mod by a positive" and the exact
value v % pow2 n is the rem postcondition, with no bitwise reasoning — and it
uses the same modular primitives as the add_wrap/sub_wrap/mul_wrap
overflow helpers.

Scope and rejections

  • Unsigned and _Bool bit-fields are supported.
  • Signed bit-fields are rejected with a diagnostic: their reads need
    sign-extension and their out-of-range writes are implementation-defined
    (C11 6.3.1.3p3), so they are not portable to bake into a verification model.
  • Union bit-fields are rejected (would need the value encoding layered onto
    the union variant model).
  • Anonymous / zero-width bit-fields (int : 3;, int : 0;) are padding and
    are skipped entirely — no record slot, no accessors.

What it gives us

  • Correct translation of a common C idiom that was previously mistranslated.
  • Reads for free (subtyping, no conversions).
  • Writes that truncate with C unsigned modular semantics, provable in user-facing
    specifications (_ensures(s->nibble == v % 16)).
  • An encoding that stays uniform across widths and consistent with PAL's existing
    unsigned-overflow handling, keeping the generated Pulse idiomatic.

Commits

  1. ir: model unsigned bit-fields as refined machine cells (IR variant, passes,
    emit field/representation helpers).
  2. frontend: parse unsigned bit-fields; reject signed/union; skip padding.
  3. pulse: add Pulse.Lib.C.BitField truncation helpers (mask_u8/16/32/64).
  4. emit: mask unsigned bit-field writes (C truncation on store).
  5. test+docs: bitfields test + internals documentation.

@KimayaBedarkar KimayaBedarkar changed the title Add support for unsigned bit-fields in structs Draft: Add support for unsigned bit-fields in structs Jun 30, 2026
@KimayaBedarkar KimayaBedarkar force-pushed the kimaya/bitfields branch 2 times, most recently from c39f0e0 to 6cdaa3d Compare July 1, 2026 17:49
@KimayaBedarkar KimayaBedarkar changed the title Draft: Add support for unsigned bit-fields in structs Add support for unsigned bit-fields in structs Jul 1, 2026
@gebner

gebner commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

This does not seem to work with typedefs:

#include "pal.h"
#include <stdint.h>

struct mine {
  uint16_t a : 2;
};

void should_work(struct mine *s) {
  _assert(s->a <= 4);
}

KimayaBedarkar and others added 5 commits July 2, 2026 16:50
Add a `FieldT::BitField { name, ty, width }` variant to the struct-field
IR and thread it through the passes. A C bit-field is not separately
addressable (`&s.f` is illegal) and shares storage units, so it does not
fit PAL's `ref T` + `pts_to` per-field model.

Instead, each unsigned bit-field is modeled as an ordinary scalar record
slot whose backing cell is the declared underlying machine type refined
to the width, e.g. `unsigned int a : N` becomes `(v:UInt32.t{UInt32.v v <
pow2 N})`. Reads are then conversion-free: the stored value already has
the bit-field's logical unsigned type.

This commit covers the read/representation side only:
  - the IR variant and its `name`/`is_array`/`fixed_array_info`/
    `logical_type` accessors;
  - pretty-printing;
  - `check`/`elab`/`prune` match arms;
  - the emit field helpers (`emit_field_default`,
    `emit_field_record_type`, `emit_field_projection_type`, and the new
    `emit_bitfield_value_type`) that produce the refined cell type.

No frontend produces bit-fields yet and writes are not masked; those
follow in subsequent commits.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Detect `f->isBitField()` while walking record declarations and build the
new `BitField` IR node. A shared `addRecordField` helper now handles both
structs and unions:

  - anonymous / zero-width padding bit-fields are skipped (no accessible
    value);
  - signed bit-fields are rejected (sign-extension on read and
    implementation-defined out-of-range writes, C11 6.3.1.3p3);
  - union bit-fields are rejected (would need the value encoding layered
    on the union variant model);
  - unsigned (and `_Bool`) bit-fields become `field_bitfield(name, ty,
    width)`, with the width read via `getBitWidthValue()`.

The plumbing is a `DeclBuilder::field_bitfield` FFI entry in `iface.zng`
and its `clang.rs` implementation. `getBitWidthValue()` is used in its
LLVM 20+ no-argument form (the build already requires clang >= 20).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add `Pulse.Lib.C.BitField` providing `mask_u8/16/32/64`. Assigning `v` to
an `n`-bit unsigned bit-field stores its low `n` bits (`v % pow2 n`, C
unsigned modular truncation); PAL backs each bit-field by a range-refined
cell `(x:UW.t{UW.v x < pow2 n})`, so the masked result must carry that
bound.

Each helper truncates by modular arithmetic (`rem` by `pow2 n`),
mirroring how `add_wrap`/`sub_wrap`/`mul_wrap` model unsigned overflow via
the modular primitives rather than bitwise ops. The `< pow2 n` bound then
holds by "mod by a positive" and the exact value `UW.v r == UW.v v %
pow2 n` comes straight from the `rem` postcondition; the `n = W` case
(field as wide as its storage) is the identity.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Route an assignment to a struct member whose field is an unsigned
bit-field through `Pulse.Lib.C.BitField.mask_uW width rhs` before the
store, so the value stored in the range-refined cell satisfies its
`< pow2 width` refinement. This is C unsigned modular truncation on
write, and mirrors how unsigned arithmetic overflow is handled via the
width-selected `add_wrap`/`sub_wrap`/`mul_wrap` helpers.

Adds `bitfield_member_mask` (recognizes a bit-field LHS and picks the
width + helper), `get_bitfield_mask_fn` (machine width -> `mask_uW`), and
the `FieldT::bit_width` accessor they use.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add a `bitfields` test exercising reads (via pointer, by value, and an
ordinary member alongside bit-fields), masked writes with truncation
postconditions (`s->a == v % 8`, `s->nibble == v % 16`), an in-range
constant write, and `_Bool` / `unsigned char` bit-fields. Document the
encoding in the internals guide and list `Pulse.Lib.C.BitField` among the
support libraries.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@KimayaBedarkar

Copy link
Copy Markdown
Collaborator Author

This does not seem to work with typedefs:

#include "pal.h"
#include <stdint.h>

struct mine {
  uint16_t a : 2;
};

void should_work(struct mine *s) {
  _assert(s->a <= 4);
}

Should work now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants