Skip to content

KAFKA-20500: Add isolation-level reads to versioned stores#22682

Open
nicktelford wants to merge 1 commit into
apache:trunkfrom
nicktelford:KIP-892/iq-isolation-versioned
Open

KAFKA-20500: Add isolation-level reads to versioned stores#22682
nicktelford wants to merge 1 commit into
apache:trunkfrom
nicktelford:KIP-892/iq-isolation-versioned

Conversation

@nicktelford

Copy link
Copy Markdown
Contributor

Part of the KIP-892 interactive-query isolation-level series. Versioned stores are queryable through IQv2 (VersionedKeyQuery, MultiVersionedKeyQuery) and IQv1, but had no way to honour the configured isolation level. When the underlying RocksDBStore is transactional its accessor consults the staged-write buffer, so a READ_COMMITTED query would incorrectly observe writes still in the current transaction.

This extends the readOnly(IsolationLevel) hook the other store families already have to versioned stores. Because versioned stores have no dedicated ReadOnly* parent interface, the default is added directly on VersionedKeyValueStore, and VersionedBytesStore.readOnly is covariantly narrowed so wrapper layers keep the versioned read methods. Reads in RocksDBVersionedStore — single-key latest, point-in-time, and timestamp-range — flow through LogicalKeyValueSegment views bound to a specific DBAccessor, so READ_COMMITTED bypasses the transaction buffer via the direct accessor. The metered and change-logging versioned wrappers gain matching overrides, and StoreQueryUtils dispatches versioned key queries through readOnly(isolationLevel).

Semantic tests assert that READ_COMMITTED hides staged writes while READ_UNCOMMITTED exposes them across the single-key, point-in-time, and timestamp-range read paths.

This branched off the now-merged RocksDB isolation-level read work (KAFKA-20498) and now applies directly to trunk.

🤖 Generated with Claude Code

Versioned stores are queryable through IQv2 (VersionedKeyQuery and
MultiVersionedKeyQuery) and IQv1, but had no way to honour the
configured interactive-query isolation level. When the underlying
RocksDBStore is transactional, its accessor consults the staged-write
buffer, so a READ_COMMITTED query would incorrectly observe writes that
are still only in the current transaction.

Extend the readOnly(IsolationLevel) hook the other store families
already have to versioned stores. Because versioned stores have no
dedicated ReadOnly* parent interface, the default is added directly on
VersionedKeyValueStore, and VersionedBytesStore.readOnly is covariantly
narrowed so wrapper layers retain the versioned read methods. Reads in
RocksDBVersionedStore — single-key latest, point-in-time, and
timestamp-range — flow through LogicalKeyValueSegment views bound to a
specific DBAccessor, so READ_COMMITTED bypasses the transaction buffer
via the direct accessor. The metered and change-logging versioned
wrappers gain matching readOnly overrides, and StoreQueryUtils
dispatches versioned key queries through readOnly(isolationLevel).

Add semantic tests asserting READ_COMMITTED hides staged writes while
READ_UNCOMMITTED exposes them across the single-key, point-in-time, and
timestamp-range read paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added triage PRs from the community streams labels Jun 26, 2026
@nicktelford

Copy link
Copy Markdown
Contributor Author

@bbejeck

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

streams triage PRs from the community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant