fix(utf32): reassemble split codepoint from overflow buffer, not source index by spokodev · Pull Request #393 · pillarjs/iconv-lite

spokodev · 2026-06-23T00:42:44Z

Problem

When decoding UTF-32 from a stream, any 4-byte unit that straddles a chunk boundary is corrupted:

const d = iconv.getDecoder('utf-32le');
d.write(Buffer.from([0x41, 0x00, 0x00])); // 'A' (U+0041), first 3 bytes
d.write(Buffer.from([0x00]));             // last byte
// → ""   (expected "A")

In big-endian the same split yields a byte-shifted character instead of A. Whole-buffer iconv.decode(buf, 'utf-32le') is unaffected — only the streaming / decodeStream path, where chunk boundaries are arbitrary (sockets, files), hits this.

Cause

encodings/utf32.js fills this.overflow to four bytes, then reassembles the codepoint using the source index i:

codepoint = overflow[i] | (overflow[i + 1] << 8) | (overflow[i + 2] << 16) | (overflow[i + 3] << 24)

overflow only holds indices 0–3, but after the fill loop i is the offset into src, so overflow[i] reads out of range (→ undefined → 0) whenever i > 0. The code comment notes this block was copied from the main loop (which correctly uses src[i]); the index was just never adjusted for the overflow buffer.

Fix

Read the reassembled bytes from overflow[0..3]:

if (isLE) {
  codepoint = overflow[0] | (overflow[1] << 8) | (overflow[2] << 16) | (overflow[3] << 24)
} else {
  codepoint = overflow[3] | (overflow[2] << 8) | (overflow[1] << 16) | (overflow[0] << 24)
}

Verification

New tests in test/utf32-test.js decode utf32leBuf / utf32beBuf split at every byte offset; they fail before, pass after.
Full suite: 320 passing, 0 failing.
Fuzz: streaming decode at random split points vs whole-buffer decode over 120,000 random strings (LE + BE) — 0 mismatches.

When a 4-byte UTF-32 unit is split across two stream chunks, the decoder fills `this.overflow` to four bytes and then read it back with the source index `i` (`overflow[i]`...`overflow[i + 3]`) instead of `overflow[0]`... `overflow[3]`. Since `overflow` only holds indices 0-3, the read landed out of range whenever `i > 0`, so every codepoint straddling a chunk boundary decoded to U+0000 (LE) or a byte-shifted character (BE). This block was copied from the main loop (which correctly uses `src[i]`); the index just was not adjusted for the overflow buffer. Whole-buffer decode was unaffected, which is why existing tests passed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix(utf32): reassemble split codepoint from overflow buffer, not source index#393

fix(utf32): reassemble split codepoint from overflow buffer, not source index#393
spokodev wants to merge 1 commit into
pillarjs:masterfrom
spokodev:fix-utf32-streaming-overflow

spokodev commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

spokodev commented Jun 23, 2026

Problem

Cause

Fix

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant