KAFKA-20514: kraft observers should use previous fetch response to decide where to send the next fetch by kevin-wu24 · Pull Request #22111 · apache/kafka

kevin-wu24 · 2026-04-21T19:17:07Z

Background

Currently, there is a timing issue where a KRaft observer can be stuck
fetching from the leader if the next poll occurs after the previous
fetch's backoff has completed, and the previous request did not time
out. This can happen if the leader's advertised endpoints are not
routable or there is a network partition. The bootstrap server endpoints
could contain routable endpoints for the leader, but the observer would
be stuck fetching from the unroutable endpoints.

Previously, there was an issue where observers could be stuck fetching
from the bootstrap servers even if it discovers leader endpoints from
the bootstrap fetch. This is because the fetch timeout is not reset on
the observer.

What changed

Observer fetching logic should ensure that within the same epoch, all
the bootstrap server endpoints and the leader have a chance to serve
fetch requests. This logic should be independent of request manager's
state. The key observation is that, just because an observer did not
successfully fetch from node X within its fetch timeout, does not mean
that node X was not actually the leader. Therefore, if the bootstrap
servers say node X is indeed the leader, an observer should resume
trying to fetch from it.

If the fetch timeout has not expired, the observer fetches from the
leader
If the fetch timeout is expired, the observer transitions to
Unattached within the same epoch, and will then fetch from the
bootstrap servers.
If the Unattached observer receives a fetch response from bootstrap
servers with leader endpoints, the Unattached observer transitions
back to Follower in the same epoch.

A voter has similar functionality where a fetch timeout expiration and a
failed pre-vote election results in a reset of the fetch timer to the
same leader in the same epoch. The following state transition: Follower
-> Prospective -> Follower, allows for a voter to refresh the fetch
timer for the leader within the same epoch, but observers do not have
this behavior currently. This PR proposes adding a similar state
transition
for observers: Follower -> Unattached -> Follower.

Testing

Added unit test to KafkaRaftClientFetchTest to show fetches oscillate
between the leader and bootstrap endpoints based on the fetch timer.

Reviewers: José Armando García Sancio jsancio@apache.org, Jonah Hooper
jhooper@confluent.io, Alyssa Huang ahuang@confluent.io

…hing

github-actions · 2026-04-29T04:18:13Z

A label of 'needs-attention' was automatically added to this PR in order to raise the
attention of the committers. Once this issue has been triaged, the triage label
should be removed to prevent this automation from happening again.

josefk31 · 2026-05-07T14:27:32Z

+     * and a non-leader source's fetch response advertises the leader's endpoints, switch
+     * the observer back to fetching from the leader.
+     */
+    private void maybeSwitchObserverFetchToLeader(


Nit: slightly more descriptive name might be a tad better - for example maybeSwitchObserverToFetchFromLeader.

josefk31 · 2026-05-07T14:45:11Z

+        long currentTimeMs
+    ) {
+        if (!hasConsistentLeader(responseEpoch, responseLeaderId)) {
+            throw new IllegalStateException("Received request or response with leader " + responseLeaderId +


Nit: we might also wish to include quorum.localIdOrSentinel in the exception message for convenience.

I followed the other usage of hasConsistentLeader, which uses quorum.localId(). If the local leader is empty, hasConsistentLeader returns true.

josefk31

Thanks for changes @kevin-wu24 :)

josefk31 · 2026-05-25T21:39:30Z

+            .withRaftProtocol(RaftClientTestContext.RaftProtocol.KIP_1166_PROTOCOL)
+            .build();
+
+        for (int i = 0; i < 10; ++i) {


Is there a specific reason why this loop must run 10 times? It would be nice to add a comment clarifying why there must be 10 iterations.

Not really, 10 is just a magic number. I just want to make sure we alternate between bootstrap endpoint + leader endpoint continuously.

jsancio

@kevin-wu24 thanks for the fix and detail description. The description helped me understand the issue. Here are some high-level comments before I look into the details of the implementation.

jsancio · 2026-05-28T10:40:51Z

    }
+
+    @Test
+    void testObserverFetchesBetweenLeaderAndBootstrapServers() throws Exception {


Does this fail without your change? If so, can you tell me exactly what fails?

Does this fail without your change? If so, can you tell me exactly what fails?

Yes, trunk fails on L800 by sending a fetch request to the leader endpoint instead of the bootstrap endpoint on the second iteration of the for loop. At this point, the fetch timeout is expired (simulating being unable to reach the leader), and the local node will continue to fetch from that endpoint for the remainder of the epoch, instead of trying to fetch from bootstrap servers.

Looking at this again after a while, the for loop actually made this harder for me to read. I'm going to remove it.

jsancio · 2026-05-28T10:42:49Z

+        maybeSwitchObserverFetchToLeader(
+            responseEpoch,
+            responseLeaderId,
+            leaderEndpoints,
+            currentTimeMs
+        );


Can you document the exact RPC trace for this issue? Can you document trace in trunk vs the trace in this branch? I am interested in the RPCs and kraft state transitions.

Can you document the exact RPC trace for this issue? Can you document trace in trunk vs the trace in this branch? I am interested in the RPCs and kraft state transitions.

Sure, the trace is below. It is a very esoteric edge case, so let me know if you have any questions.

Assume that the observer is unable to establish a TCP connection with the leader endpoint X. This means that requestManager.hasRequestTimedOut in maybeSendFetchToBestNode will return false, since the raft client gets an error response from its network client when it cannot connect to the request destination node. Also, assume that on the subsequent poll(), the requestManager.isBackingOff is also false. This is what I am referring to when I say the fetching destination logic needs to be independent of the request manager state.

The observer starts up either in the Unattached state or the FollowerState with a known leader endpoint X at the current epoch.

On trunk:

If the observer is in the first case, we enter pollUnattachedCommon to fetch from bootstrap servers, and if there is a leader, we receive its endpoint X and transition to FollowerState, so we are in the second case going forward.

Observer tries to fetch from the leader going forward. Under our assumptions above, maybeSendFetchToBestNode will always execute code in the else if (!requestManager.hasAnyInflightRequest) branch, which sends a fetch to the unreachable endpoint X.

The observer is now unable to make progress so long as requestManager.isBackingOff keeps returning false and endpoint X is partitioned from the observer.

On this branch:

If the observer is in the first case, we pollUnattachedCommon to fetch from bootstrap servers, and if there is a leader, we receive its endpoint X and transition to FollowerState, so we are in the second case going forward.

If the observer is unable to fetch from the leader within the fetch timeout, it transitions from FollowerState(leader X) -> Unattached(leader X) in the same epoch, which allows it to fetch from the bootstrap servers.

If the bootstrap servers are still in the same epoch with the same leader X, their completed fetch response to the observer will transition the observer from Unattached(leader X) -> FollowerState(leader X). The observer will then fetch from endpoint X.

If there is a leader election and an epoch bump, the observer can discover that from the bootstrap servers. The observer cannot discover this new state if it is stuck fetching from X to which it cannot connect.

The motivation for the proposed behavior is that on trunk, the observer can be "stuck" (depending on the request manager state) fetching from a leader's endpoint which is unreachable. This "starves" the bootstrap endpoints and prevents the observer from discovering a new leader or fetching until X becomes reachable.

Why do you need this special handler for FETCH? How about the other RPCs that follower send like FETCH_SNAPSHOT and UPDATE_VOTER?

I am checking:

else if (responseEpoch == quorum.epoch() && quorum.isUnattached() && responseLeaderId.isPresent() && !leaderEndpoints.isEmpty()) {

in this method. This handler and logic only applies to nodes in the Unattached state, which can only fetch from the bootstrap servers. A follower whose fetch timeout has expired transitions to Unattached and then can only send Fetch.

josefk31 · 2026-06-04T21:17:47Z

+
+            // The fetch timeout is much greater than the request manager's configured backoff, so the
+            // current unreachable connection will no longer be backing off when the next fetch is sent.
+            context.deliverResponse(


IIUC in the actual failure mode which causes this bug, we never get a response from the leader at all. It's completely disconnected. is that the same as receiving a Errors.BROKER_NOT_AVAILABLE? In this test case RaftClieint will actually receive a response while in IRL it will not.

is that the same as receiving a Errors.BROKER_NOT_AVAILABLE

Yes, it is the same. The specific case is when the local node cannot establish a TCP connection with the leader endpoint. This causes the destination node to be considered "disconnected" and the RPC is never sent over the wire. If you trace through the KafkaNetworkChannel + network client code this ends up "returning" a dummy RPC error response to the local raft client with Errors.BROKER_NOT_AVAILABLE. This is what clears up the raft request manager's AWAITING_RESPONSE state for the "connection."

In my opinion, there is less of a clear motivation to change the above behavior compared to allowing the observer to behave more like a voter. What do you think?

josefk31

Thanks for changes! A few more comments :)

josefk31 · 2026-06-04T21:20:33Z

+            context.pollUntilRequest();
+            final var bootstrapFetch = context.assertSentFetchRequest();
+            assertEquals(-2, bootstrapFetch.destination().id());
+            assertEquals(RaftClientTestContext.mockAddress(otherVoter.id()).getHostName(), bootstrapFetch.destination().host());


Nit: test would be clearer if we changed the variable name from otherVoter to bootstrapVoter

josefk31 · 2026-06-04T21:23:15Z

        context.time.sleep(context.fetchTimeoutMs);
        context.pollUntilRequest();
-        assertTrue(context.client.quorum().isFollower());
+        assertFalse(context.client.quorum().isProspective());


Nit: do we with to add an assert for whether it becomes Unattached?

josefk31 · 2026-06-04T21:25:38Z

            }
            return sendResult.timeToWaitMs();
+        } else if (state.hasFetchTimeoutExpired(currentTimeMs)) {
+            transitionToUnattached(state.epoch(), OptionalInt.of(state.leaderId()));


We should add a test which explicitly checks that the observer can transition to unattached if there is a timeout.

I added this to the existing unit test.

josefk31

Thanks for changes! Looks good to me :)

ahuang98 · 2026-06-15T22:38:45Z

        return fetchTimer.remainingMs();
    }

+    public long remainingUpdateVoterSetTimeMs(long currentTimeMs) {


what's this change for? it's only used in the testContext?

It's to fix some logic in RaftClientTestContext#advanceTimeAndCompleteFetch, so that both timers won't be expired on a subsequent invocation of the helper method. I was originally using this method to write my test case, but ended up changing the test to not use this helper. I can remove it from the PR if it is confusing things.

ahuang98 · 2026-06-15T22:43:55Z

            leaderEndpoints = Endpoints.empty();
        }

+        maybeSwitchObserverFetchToLeader(


is this transition not already handled by maybeHandleCommonResponse?

If we store the leader information when becoming Unattached in the same epoch after the fetch timeout expires, receiving a response from a bootstrap fetch that contains the same leader from the same epoch will fall through to maybeTransition and not match any of the cases in that method.

ahuang98

Partial review, getting up to speed on the issue again.

I do wonder if there's a separate problem regarding timing that is worth fixing.

We should have the following two distinct paths:

request times out (no response) → RequestManager moves state to READY → retry immediately
request errors → RequestManager moves state to BACKING_OFF for retryBackoffMs

When a fetch can't be sent (e.g. unreachable endpoint), the network channel returns a BROKER_NOT_AVAILABLE response, which is arguably meant to push RequestManager into BACKING_OFF state. But that response is always discarded, and I think that's the separate issue we might want to fix (in a separate PR).

RequestManager's response timeout fires at >= lastSendTimeMs + requestTimeoutMs while the network channel's synthesized BROKER_NOT_AVAILABLE response will only appear at > createdTimeMs + requestTimeoutMs. lastSendTimeMs equals createdTimeMs and both RequestManager and the network channel use the same requestTimeoutMs so the RequestManager timeout will always fire first and we'll never actually see the BROKER_NOT_AVAILABLE response.

jsancio

Partial review.

jsancio · 2026-06-16T17:04:10Z

+        RaftRequest.Outbound fetchRequest = context.assertSentFetchRequest();
+        if (isBootstrapFetch) {
+            assertTrue(context.client.quorum().isUnattached());
+            assertEquals(-2, fetchRequest.destination().id());


This is an implementation detail that I am not sure that we should check in the protocol. Maybe it is enough to check that id is less than -1 (network client hack to capture bootstrap node) and check the destination endpoint is the correct endpoint (leader endpoint vs bootstrap endpoint).

jsancio · 2026-06-16T17:07:09Z

            .withStaticVoters(voters)
-            .withBootstrapServers(Optional.of(List.of(RaftClientTestContext.mockAddress(otherVoter.id()))))
+            .withBootstrapServers(
+                Optional.of(List.of(RaftClientTestContext.mockAddress(bootstrapVoter.id())))


Let's document that you are doing this to reliably check fetches to the leader (known node) vs fetches to the bootstrap server (unknown nodes). Another way to check this is that "bootstrap nodes" have an unknown id. We represent this in the network client by giving those nodes an id less than -1. RPCs to known kafka nodes have an id greater than or equal to 0.

jsancio · 2026-06-16T17:12:31Z

        int leaderId,
        boolean expireUpdateVoterSetTimer
    ) throws Exception {
+        final var state = client.quorum().followerStateOrThrow();


Can we implement this without using internal kraft state? These are protocol tests and ideally should not know anything about the internal implementation. In the future we should be able to change the kraft implementation and not have to update any of the KafkaRaftClient*Test tests.

Hmmm, it looks like #22111 (comment) was because some auto-join tests would fail without this change. It's been a while since I changed this.

Subsequent invocations to this method can expire the fetch timeout. Prior to this change, the local node did not change state, but now you go to Unattached, so completing the fetch transitions the node back to Follower, but with a new updateVoterSetTimer that is not expired. This causes the test to fail on the second assertion of sending add/remove voter.

I updated this method to not expire the fetch timeout on subsequent invocations without leaking kraft state. I documented how callers are expected to invoke this method.

jsancio · 2026-06-16T17:13:26Z

+    public long remainingUpdateVoterSetTimeMs(long currentTimeMs) {
+        updateVoterSetPeriodTimer.update(currentTimeMs);
+        return updateVoterSetPeriodTimer.remainingMs();
+    }


See my other comment but ideally we should not have this method if is not used by the kraft implementation.

Sounds good, I can remove this.

jsancio · 2026-06-16T19:23:35Z

+        maybeSwitchObserverFetchToLeader(
+            responseEpoch,
+            responseLeaderId,
+            leaderEndpoints,
+            currentTimeMs
+        );


Why do you need this special handler for FETCH? How about the other RPCs that follower send like FETCH_SNAPSHOT and UPDATE_VOTER?

kevin-wu24 · 2026-06-16T21:43:00Z

Partial review, getting up to speed on the issue again.

I do wonder if there's a separate problem regarding timing that is worth fixing.

We should have the following two distinct paths:

request times out (no response) → RequestManager moves state to READY → retry immediately

request errors → RequestManager moves state to BACKING_OFF for retryBackoffMs

When a fetch can't be sent (e.g. unreachable endpoint), the network channel returns a BROKER_NOT_AVAILABLE response, which is arguably meant to push RequestManager into BACKING_OFF state. But that response is always discarded, and I think that's the separate issue we might want to fix (in a separate PR).

RequestManager's response timeout fires at >= lastSendTimeMs + requestTimeoutMs while the network channel's synthesized BROKER_NOT_AVAILABLE response will only appear at > createdTimeMs + requestTimeoutMs. lastSendTimeMs equals createdTimeMs and both RequestManager and the network channel use the same requestTimeoutMs so the RequestManager timeout will always fire first and we'll never actually see the BROKER_NOT_AVAILABLE response.

Thanks for the review @ahuang98. WRT to your comment:

The case with which this case is dealing is one where a TCP handshake between the local and destination node cannot be established. This is evidenced by repeated logs like the one below from the cluster where this behavior was observed:

[WARN] 2026-04-06 21:18:28,964 [kafka-0-raft-outbound-request-thread] org.apache.kafka.clients.NetworkClient initiateConnect - [RaftManager id=0] Error connecting to node ...

This means the KRaft fetch request is never sent over the wire by the local node. Instead, the connection failure will result in the connection state being DISCONNECTED, and InterBrokerSendThread#checkDisconnects will complete the request. The specific completion handler being invoked, KafkaNetworkChannel#sendOnComplete, is what adds the BROKER_NOT_AVAILABLE error response to the local node's message queue.

I think you're referring to a case where we fall through to InterBrokerSendThread#failExpiredRequests, but I believe our request has already been removed from unsentRequests because the connection state to the destination is disconnected. Is my understanding correct?

ahuang98 · 2026-06-18T20:42:35Z

Thanks for the review @ahuang98. WRT to your comment:

The case with which this case is dealing is one where a TCP handshake between the local and destination node cannot be established. This is evidenced by repeated logs like the one below from the cluster where this behavior was observed:
[WARN] 2026-04-06 21:18:28,964 [kafka-0-raft-outbound-request-thread] org.apache.kafka.clients.NetworkClient initiateConnect - [RaftManager id=0] Error connecting to node ...
This means the KRaft fetch request is never sent over the wire by the local node. Instead, the connection failure will result in the connection state being DISCONNECTED, and InterBrokerSendThread#checkDisconnects will complete the request. The specific completion handler being invoked, KafkaNetworkChannel#sendOnComplete, is what adds the BROKER_NOT_AVAILABLE error response to the local node's message queue.

I think you're referring to a case where we fall through to InterBrokerSendThread#failExpiredRequests, but I believe our request has already been removed from unsentRequests because the connection state to the destination is disconnected. Is my understanding correct?

Synced offline, it seems we're referring to different incidents which can result in the same broken behavior. My comments on a potential RequestManager & KafkaNetworkChannel timing issue stem from https://gist.github.com/justin-chen/a7deade5b0ab17b33d64ec07cd2542ab

jsancio

Thanks for the changes. I took a look at the problem in more detail.

jsancio · 2026-06-25T19:03:27Z

    }
+
+    @Test
+    void testObserverFetchesBetweenLeaderAndBootstrapServers() throws Exception {


I ran this test against this PR and I got this trace:

[2026-06-25 15:02:25,329] INFO Starting request manager with bootstrap servers: [localhost:10634 (id: -2 rack: null isFenced: false)] (org.apache.kafka.raft.KafkaRaftClient:331) [2026-06-25 15:02:25,561] INFO Reading KRaft snapshot and log as part of the initialization (org.apache.kafka.raft.KafkaRaftClient:509) [2026-06-25 15:02:25,563] INFO Starting voters are VoterSet(voters={643=VoterNode(voterKey=ReplicaKey(id=643, directoryId=<undefined>), listeners=Endpoints(endpoints={ListenerName(LISTENER)=localhost/<unresolved>:10633}), supportedKRaftVersion=SupportedVersionRange[min_version:0, max_version:0]), 644=VoterNode(voterKey=ReplicaKey(id=644, directoryId=<undefined>), listeners=Endpoints(endpoints={ListenerName(LISTENER)=localhost/<unresolved>:10634}), supportedKRaftVersion=SupportedVersionRange[min_version:0, max_version:0])}) (org.apache.kafka.raft.KafkaRaftClient:511) [2026-06-25 15:02:25,565] INFO Attempting durable transition to UnattachedState(epoch=0, leaderId=OptionalInt.empty, votedKey=Optional.empty, voters=[643, 644], electionTimeoutMs=18985, highWatermark=Optional.empty) from null (org.apache.kafka.raft.QuorumState:732) [2026-06-25 15:02:25,568] INFO Completed transition to UnattachedState(epoch=0, leaderId=OptionalInt.empty, votedKey=Optional.empty, voters=[643, 644], electionTimeoutMs=18985, highWatermark=Optional.empty) from null (org.apache.kafka.raft.QuorumState:744) [2026-06-25 15:02:25,586] TRACE Sent outbound request: OutboundRequest(correlationId=0, data=FetchRequestData(clusterId='Xs7d_i8LRIuAcKg9hc0dhw', replicaId=-1, replicaState=ReplicaState(replicaId=642, replicaEpoch=-1), maxWaitMs=0, minBytes=0, maxBytes=1048576, isolationLevel=0, sessionId=0, sessionEpoch=-1, topics=[FetchTopic(topic='metadata', topicId=AAAAAAAAAAAAAAAAAAAAAQ, partitions=[FetchPartition(partition=0, currentLeaderEpoch=0, fetchOffset=0, lastFetchedEpoch=0, logStartOffset=-1, partitionMaxBytes=0, replicaDirectoryId=ezHminGtTAmIQQ3i5JFLUQ, highWatermark=-1)])], forgottenTopicsData=[], rackId=''), createdTimeMs=1782414145309, destination=localhost:10634 (id: -2 rack: null isFenced: false)) (org.apache.kafka.raft.KafkaRaftClient:2908) [2026-06-25 15:02:25,587] INFO Registered the listener org.apache.kafka.raft.RaftClientTestContext$MockListener@107632469 (org.apache.kafka.raft.KafkaRaftClient:3590) [2026-06-25 15:02:25,725] TRACE Received inbound message InboundResponse(correlationId=0, data=FetchResponseData(throttleTimeMs=0, errorCode=0, sessionId=0, responses=[FetchableTopicResponse(topic='', topicId=AAAAAAAAAAAAAAAAAAAAAQ, partitions=[PartitionData(partitionIndex=0, errorCode=6, highWatermark=0, lastStableOffset=-1, logStartOffset=-1, divergingEpoch=EpochEndOffset(epoch=-1, endOffset=-1), currentLeader=LeaderIdAndEpoch(leaderId=643, leaderEpoch=2), snapshotId=SnapshotId(endOffset=-1, epoch=-1), abortedTransactions=[], preferredReadReplica=-1, records=MemoryRecords(size=0, buffer=java.nio.HeapByteBuffer[pos=0 lim=0 cap=37]))])], nodeEndpoints=[NodeEndpoint(nodeId=643, host='localhost', port=10633, rack=null)]), source=localhost:10634 (id: -2 rack: null isFenced: false)) (org.apache.kafka.raft.KafkaRaftClient:2848) [2026-06-25 15:02:25,726] INFO Attempting durable transition to FollowerState(fetchTimeoutMs=50000, epoch=2, leader=643, leaderEndpoints=Endpoints(endpoints={ListenerName(LISTENER)=localhost/<unresolved>:10633}), votedKey=Optional.empty, voters=[643, 644], highWatermark=Optional.empty, fetchingSnapshot=Optional.empty) from UnattachedState(epoch=0, leaderId=OptionalInt.empty, votedKey=Optional.empty, voters=[643, 644], electionTimeoutMs=18985, highWatermark=Optional.empty) (org.apache.kafka.raft.QuorumState:732) [2026-06-25 15:02:25,727] INFO Completed transition to FollowerState(fetchTimeoutMs=50000, epoch=2, leader=643, leaderEndpoints=Endpoints(endpoints={ListenerName(LISTENER)=localhost/<unresolved>:10633}), votedKey=Optional.empty, voters=[643, 644], highWatermark=Optional.empty, fetchingSnapshot=Optional.empty) from UnattachedState(epoch=0, leaderId=OptionalInt.empty, votedKey=Optional.empty, voters=[643, 644], electionTimeoutMs=18985, highWatermark=Optional.empty) (org.apache.kafka.raft.QuorumState:744) [2026-06-25 15:02:25,728] DEBUG Notifying listener org.apache.kafka.raft.RaftClientTestContext$MockListener@107632469 of leader change LeaderAndEpoch[leaderId=OptionalInt[643], epoch=2] (org.apache.kafka.raft.KafkaRaftClient:4121) [2026-06-25 15:02:25,836] TRACE Sent outbound request: OutboundRequest(correlationId=1, data=FetchRequestData(clusterId='Xs7d_i8LRIuAcKg9hc0dhw', replicaId=-1, replicaState=ReplicaState(replicaId=642, replicaEpoch=-1), maxWaitMs=0, minBytes=0, maxBytes=1048576, isolationLevel=0, sessionId=0, sessionEpoch=-1, topics=[FetchTopic(topic='metadata', topicId=AAAAAAAAAAAAAAAAAAAAAQ, partitions=[FetchPartition(partition=0, currentLeaderEpoch=2, fetchOffset=0, lastFetchedEpoch=0, logStartOffset=-1, partitionMaxBytes=0, replicaDirectoryId=ezHminGtTAmIQQ3i5JFLUQ, highWatermark=-1)])], forgottenTopicsData=[], rackId=''), createdTimeMs=1782414145309, destination=localhost:10633 (id: 643 rack: null isFenced: false)) (org.apache.kafka.raft.KafkaRaftClient:2908) [2026-06-25 15:02:25,837] INFO Attempting durable transition to UnattachedState(epoch=2, leaderId=OptionalInt[643], votedKey=Optional.empty, voters=[643, 644], electionTimeoutMs=9223372036854775807, highWatermark=Optional.empty) from FollowerState(fetchTimeoutMs=50000, epoch=2, leader=643, leaderEndpoints=Endpoints(endpoints={ListenerName(LISTENER)=localhost/<unresolved>:10633}), votedKey=Optional.empty, voters=[643, 644], highWatermark=Optional.empty, fetchingSnapshot=Optional.empty) (org.apache.kafka.raft.QuorumState:732) [2026-06-25 15:02:25,837] INFO Completed transition to UnattachedState(epoch=2, leaderId=OptionalInt[643], votedKey=Optional.empty, voters=[643, 644], electionTimeoutMs=9223372036854775807, highWatermark=Optional.empty) from FollowerState(fetchTimeoutMs=50000, epoch=2, leader=643, leaderEndpoints=Endpoints(endpoints={ListenerName(LISTENER)=localhost/<unresolved>:10633}), votedKey=Optional.empty, voters=[643, 644], highWatermark=Optional.empty, fetchingSnapshot=Optional.empty) (org.apache.kafka.raft.QuorumState:744) [2026-06-25 15:02:25,837] TRACE Received inbound message InboundResponse(correlationId=1, data=FetchResponseData(throttleTimeMs=0, errorCode=8, sessionId=0, responses=[], nodeEndpoints=[]), source=localhost:10633 (id: 643 rack: null isFenced: false)) (org.apache.kafka.raft.KafkaRaftClient:2848) [2026-06-25 15:02:25,838] DEBUG Ignoring response InboundResponse(correlationId=1, data=FetchResponseData(throttleTimeMs=0, errorCode=8, sessionId=0, responses=[], nodeEndpoints=[]), source=localhost:10633 (id: 643 rack: null isFenced: false)) since it is no longer needed (org.apache.kafka.raft.KafkaRaftClient:2856) [2026-06-25 15:02:25,942] TRACE Sent outbound request: OutboundRequest(correlationId=2, data=FetchRequestData(clusterId='Xs7d_i8LRIuAcKg9hc0dhw', replicaId=-1, replicaState=ReplicaState(replicaId=642, replicaEpoch=-1), maxWaitMs=0, minBytes=0, maxBytes=1048576, isolationLevel=0, sessionId=0, sessionEpoch=-1, topics=[FetchTopic(topic='metadata', topicId=AAAAAAAAAAAAAAAAAAAAAQ, partitions=[FetchPartition(partition=0, currentLeaderEpoch=2, fetchOffset=0, lastFetchedEpoch=0, logStartOffset=-1, partitionMaxBytes=0, replicaDirectoryId=ezHminGtTAmIQQ3i5JFLUQ, highWatermark=-1)])], forgottenTopicsData=[], rackId=''), createdTimeMs=1782414195310, destination=localhost:10634 (id: -2 rack: null isFenced: false)) (org.apache.kafka.raft.KafkaRaftClient:2908) [2026-06-25 15:02:25,943] TRACE Received inbound message InboundResponse(correlationId=2, data=FetchResponseData(throttleTimeMs=0, errorCode=0, sessionId=0, responses=[FetchableTopicResponse(topic='', topicId=AAAAAAAAAAAAAAAAAAAAAQ, partitions=[PartitionData(partitionIndex=0, errorCode=6, highWatermark=0, lastStableOffset=-1, logStartOffset=-1, divergingEpoch=EpochEndOffset(epoch=-1, endOffset=-1), currentLeader=LeaderIdAndEpoch(leaderId=643, leaderEpoch=2), snapshotId=SnapshotId(endOffset=-1, epoch=-1), abortedTransactions=[], preferredReadReplica=-1, records=MemoryRecords(size=0, buffer=java.nio.HeapByteBuffer[pos=0 lim=0 cap=37]))])], nodeEndpoints=[NodeEndpoint(nodeId=643, host='localhost', port=10633, rack=null)]), source=localhost:10634 (id: -2 rack: null isFenced: false)) (org.apache.kafka.raft.KafkaRaftClient:2848) [2026-06-25 15:02:25,943] INFO Attempting durable transition to FollowerState(fetchTimeoutMs=50000, epoch=2, leader=643, leaderEndpoints=Endpoints(endpoints={ListenerName(LISTENER)=localhost/<unresolved>:10633}), votedKey=Optional.empty, voters=[643, 644], highWatermark=Optional.empty, fetchingSnapshot=Optional.empty) from UnattachedState(epoch=2, leaderId=OptionalInt[643], votedKey=Optional.empty, voters=[643, 644], electionTimeoutMs=9223372036854775807, highWatermark=Optional.empty) (org.apache.kafka.raft.QuorumState:732) [2026-06-25 15:02:25,944] INFO Completed transition to FollowerState(fetchTimeoutMs=50000, epoch=2, leader=643, leaderEndpoints=Endpoints(endpoints={ListenerName(LISTENER)=localhost/<unresolved>:10633}), votedKey=Optional.empty, voters=[643, 644], highWatermark=Optional.empty, fetchingSnapshot=Optional.empty) from UnattachedState(epoch=2, leaderId=OptionalInt[643], votedKey=Optional.empty, voters=[643, 644], electionTimeoutMs=9223372036854775807, highWatermark=Optional.empty) (org.apache.kafka.raft.QuorumState:744) [2026-06-25 15:02:26,047] TRACE Sent outbound request: OutboundRequest(correlationId=3, data=FetchRequestData(clusterId='Xs7d_i8LRIuAcKg9hc0dhw', replicaId=-1, replicaState=ReplicaState(replicaId=642, replicaEpoch=-1), maxWaitMs=0, minBytes=0, maxBytes=1048576, isolationLevel=0, sessionId=0, sessionEpoch=-1, topics=[FetchTopic(topic='metadata', topicId=AAAAAAAAAAAAAAAAAAAAAQ, partitions=[FetchPartition(partition=0, currentLeaderEpoch=2, fetchOffset=0, lastFetchedEpoch=0, logStartOffset=-1, partitionMaxBytes=0, replicaDirectoryId=ezHminGtTAmIQQ3i5JFLUQ, highWatermark=-1)])], forgottenTopicsData=[], rackId=''), createdTimeMs=1782414195310, destination=localhost:10633 (id: 643 rack: null isFenced: false)) (org.apache.kafka.raft.KafkaRaftClient:2908)

Can we get a TRACE of the actual issue to make sure we are solving the correct problem? I am having a hard time understanding the actual problem so I am sure that this change solves that problem.

Here is an updated trace from my most recent local changes:

[2026-06-26 09:50:15,417] INFO Starting request manager with bootstrap servers: [localhost:10139 (id: -2 rack: null isFenced: false)] (org.apache.kafka.raft.KafkaRaftClient:331) [2026-06-26 09:50:15,600] INFO Reading KRaft snapshot and log as part of the initialization (org.apache.kafka.raft.KafkaRaftClient:509) [2026-06-26 09:50:15,601] INFO Starting voters are VoterSet(voters={148=VoterNode(voterKey=ReplicaKey(id=148, directoryId=<undefined>), listeners=Endpoints(endpoints={ListenerName(LISTENER)=localhost/<unresolved>:10138}), supportedKRaftVersion=SupportedVersionRange[min_version:0, max_version:0]), 149=VoterNode(voterKey=ReplicaKey(id=149, directoryId=<undefined>), listeners=Endpoints(endpoints={ListenerName(LISTENER)=localhost/<unresolved>:10139}), supportedKRaftVersion=SupportedVersionRange[min_version:0, max_version:0])}) (org.apache.kafka.raft.KafkaRaftClient:511) [2026-06-26 09:50:15,603] INFO Attempting durable transition to UnattachedState(epoch=0, leaderId=OptionalInt.empty, votedKey=Optional.empty, voters=[148, 149], electionTimeoutMs=18985, highWatermark=Optional.empty) from null (org.apache.kafka.raft.QuorumState:732) [2026-06-26 09:50:15,605] INFO Completed transition to UnattachedState(epoch=0, leaderId=OptionalInt.empty, votedKey=Optional.empty, voters=[148, 149], electionTimeoutMs=18985, highWatermark=Optional.empty) from null (org.apache.kafka.raft.QuorumState:744) [2026-06-26 09:50:15,614] TRACE Sent outbound request: OutboundRequest(correlationId=0, data=FetchRequestData(clusterId='sSoE9smGSQqjfEuTnlMPsA', replicaId=-1, replicaState=ReplicaState(replicaId=147, replicaEpoch=-1), maxWaitMs=0, minBytes=0, maxBytes=1048576, isolationLevel=0, sessionId=0, sessionEpoch=-1, topics=[FetchTopic(topic='metadata', topicId=AAAAAAAAAAAAAAAAAAAAAQ, partitions=[FetchPartition(partition=0, currentLeaderEpoch=0, fetchOffset=0, lastFetchedEpoch=0, logStartOffset=-1, partitionMaxBytes=0, replicaDirectoryId=XiEwxtuzSGuh5WQsWw8VnQ, highWatermark=-1)])], forgottenTopicsData=[], rackId=''), createdTimeMs=1782485415405, destination=localhost:10139 (id: -2 rack: null isFenced: false)) (org.apache.kafka.raft.KafkaRaftClient:2908) [2026-06-26 09:50:15,615] INFO Registered the listener org.apache.kafka.raft.RaftClientTestContext$MockListener@220558713 (org.apache.kafka.raft.KafkaRaftClient:3590) [2026-06-26 09:50:15,707] TRACE Received inbound message InboundResponse(correlationId=0, data=FetchResponseData(throttleTimeMs=0, errorCode=0, sessionId=0, responses=[FetchableTopicResponse(topic='', topicId=AAAAAAAAAAAAAAAAAAAAAQ, partitions=[PartitionData(partitionIndex=0, errorCode=6, highWatermark=0, lastStableOffset=-1, logStartOffset=-1, divergingEpoch=EpochEndOffset(epoch=-1, endOffset=-1), currentLeader=LeaderIdAndEpoch(leaderId=148, leaderEpoch=2), snapshotId=SnapshotId(endOffset=-1, epoch=-1), abortedTransactions=[], preferredReadReplica=-1, records=MemoryRecords(size=0, buffer=java.nio.HeapByteBuffer[pos=0 lim=0 cap=37]))])], nodeEndpoints=[NodeEndpoint(nodeId=148, host='localhost', port=10138, rack=null)]), source=localhost:10139 (id: -2 rack: null isFenced: false)) (org.apache.kafka.raft.KafkaRaftClient:2848) [2026-06-26 09:50:15,708] INFO Attempting durable transition to FollowerState(fetchTimeoutMs=50000, epoch=2, leader=148, leaderEndpoints=Endpoints(endpoints={ListenerName(LISTENER)=localhost/<unresolved>:10138}), votedKey=Optional.empty, voters=[148, 149], highWatermark=Optional.empty, fetchingSnapshot=Optional.empty) from UnattachedState(epoch=0, leaderId=OptionalInt.empty, votedKey=Optional.empty, voters=[148, 149], electionTimeoutMs=18985, highWatermark=Optional.empty) (org.apache.kafka.raft.QuorumState:732) [2026-06-26 09:50:15,709] INFO Completed transition to FollowerState(fetchTimeoutMs=50000, epoch=2, leader=148, leaderEndpoints=Endpoints(endpoints={ListenerName(LISTENER)=localhost/<unresolved>:10138}), votedKey=Optional.empty, voters=[148, 149], highWatermark=Optional.empty, fetchingSnapshot=Optional.empty) from UnattachedState(epoch=0, leaderId=OptionalInt.empty, votedKey=Optional.empty, voters=[148, 149], electionTimeoutMs=18985, highWatermark=Optional.empty) (org.apache.kafka.raft.QuorumState:744) [2026-06-26 09:50:15,709] DEBUG Notifying listener org.apache.kafka.raft.RaftClientTestContext$MockListener@220558713 of leader change LeaderAndEpoch[leaderId=OptionalInt[148], epoch=2] (org.apache.kafka.raft.KafkaRaftClient:4121) [2026-06-26 09:50:15,812] TRACE Sent outbound request: OutboundRequest(correlationId=1, data=FetchRequestData(clusterId='sSoE9smGSQqjfEuTnlMPsA', replicaId=-1, replicaState=ReplicaState(replicaId=147, replicaEpoch=-1), maxWaitMs=0, minBytes=0, maxBytes=1048576, isolationLevel=0, sessionId=0, sessionEpoch=-1, topics=[FetchTopic(topic='metadata', topicId=AAAAAAAAAAAAAAAAAAAAAQ, partitions=[FetchPartition(partition=0, currentLeaderEpoch=2, fetchOffset=0, lastFetchedEpoch=0, logStartOffset=-1, partitionMaxBytes=0, replicaDirectoryId=XiEwxtuzSGuh5WQsWw8VnQ, highWatermark=-1)])], forgottenTopicsData=[], rackId=''), createdTimeMs=1782485415405, destination=localhost:10138 (id: 148 rack: null isFenced: false)) (org.apache.kafka.raft.KafkaRaftClient:2908) [2026-06-26 09:50:15,813] TRACE Received inbound message InboundResponse(correlationId=1, data=FetchResponseData(throttleTimeMs=0, errorCode=8, sessionId=0, responses=[], nodeEndpoints=[]), source=localhost:10138 (id: 148 rack: null isFenced: false)) (org.apache.kafka.raft.KafkaRaftClient:2848) [2026-06-26 09:50:15,814] INFO Attempting durable transition to UnattachedState(epoch=2, leaderId=OptionalInt[148], votedKey=Optional.empty, voters=[148, 149], electionTimeoutMs=9223372036854775807, highWatermark=Optional.empty) from FollowerState(fetchTimeoutMs=50000, epoch=2, leader=148, leaderEndpoints=Endpoints(endpoints={ListenerName(LISTENER)=localhost/<unresolved>:10138}), votedKey=Optional.empty, voters=[148, 149], highWatermark=Optional.empty, fetchingSnapshot=Optional.empty) (org.apache.kafka.raft.QuorumState:732) [2026-06-26 09:50:15,814] INFO Completed transition to UnattachedState(epoch=2, leaderId=OptionalInt[148], votedKey=Optional.empty, voters=[148, 149], electionTimeoutMs=9223372036854775807, highWatermark=Optional.empty) from FollowerState(fetchTimeoutMs=50000, epoch=2, leader=148, leaderEndpoints=Endpoints(endpoints={ListenerName(LISTENER)=localhost/<unresolved>:10138}), votedKey=Optional.empty, voters=[148, 149], highWatermark=Optional.empty, fetchingSnapshot=Optional.empty) (org.apache.kafka.raft.QuorumState:744) [2026-06-26 09:50:15,918] TRACE Sent outbound request: OutboundRequest(correlationId=2, data=FetchRequestData(clusterId='sSoE9smGSQqjfEuTnlMPsA', replicaId=-1, replicaState=ReplicaState(replicaId=147, replicaEpoch=-1), maxWaitMs=0, minBytes=0, maxBytes=1048576, isolationLevel=0, sessionId=0, sessionEpoch=-1, topics=[FetchTopic(topic='metadata', topicId=AAAAAAAAAAAAAAAAAAAAAQ, partitions=[FetchPartition(partition=0, currentLeaderEpoch=2, fetchOffset=0, lastFetchedEpoch=0, logStartOffset=-1, partitionMaxBytes=0, replicaDirectoryId=XiEwxtuzSGuh5WQsWw8VnQ, highWatermark=-1)])], forgottenTopicsData=[], rackId=''), createdTimeMs=1782485465406, destination=localhost:10139 (id: -2 rack: null isFenced: false)) (org.apache.kafka.raft.KafkaRaftClient:2908) [2026-06-26 09:50:15,919] TRACE Received inbound message InboundResponse(correlationId=2, data=FetchResponseData(throttleTimeMs=0, errorCode=0, sessionId=0, responses=[FetchableTopicResponse(topic='', topicId=AAAAAAAAAAAAAAAAAAAAAQ, partitions=[PartitionData(partitionIndex=0, errorCode=6, highWatermark=0, lastStableOffset=-1, logStartOffset=-1, divergingEpoch=EpochEndOffset(epoch=-1, endOffset=-1), currentLeader=LeaderIdAndEpoch(leaderId=148, leaderEpoch=2), snapshotId=SnapshotId(endOffset=-1, epoch=-1), abortedTransactions=[], preferredReadReplica=-1, records=MemoryRecords(size=0, buffer=java.nio.HeapByteBuffer[pos=0 lim=0 cap=37]))])], nodeEndpoints=[NodeEndpoint(nodeId=148, host='localhost', port=10138, rack=null)]), source=localhost:10139 (id: -2 rack: null isFenced: false)) (org.apache.kafka.raft.KafkaRaftClient:2848) [2026-06-26 09:50:15,920] INFO Attempting durable transition to FollowerState(fetchTimeoutMs=50000, epoch=2, leader=148, leaderEndpoints=Endpoints(endpoints={ListenerName(LISTENER)=localhost/<unresolved>:10138}), votedKey=Optional.empty, voters=[148, 149], highWatermark=Optional.empty, fetchingSnapshot=Optional.empty) from UnattachedState(epoch=2, leaderId=OptionalInt[148], votedKey=Optional.empty, voters=[148, 149], electionTimeoutMs=9223372036854775807, highWatermark=Optional.empty) (org.apache.kafka.raft.QuorumState:732) [2026-06-26 09:50:15,920] INFO Completed transition to FollowerState(fetchTimeoutMs=50000, epoch=2, leader=148, leaderEndpoints=Endpoints(endpoints={ListenerName(LISTENER)=localhost/<unresolved>:10138}), votedKey=Optional.empty, voters=[148, 149], highWatermark=Optional.empty, fetchingSnapshot=Optional.empty) from UnattachedState(epoch=2, leaderId=OptionalInt[148], votedKey=Optional.empty, voters=[148, 149], electionTimeoutMs=9223372036854775807, highWatermark=Optional.empty) (org.apache.kafka.raft.QuorumState:744) [2026-06-26 09:50:16,024] TRACE Sent outbound request: OutboundRequest(correlationId=3, data=FetchRequestData(clusterId='sSoE9smGSQqjfEuTnlMPsA', replicaId=-1, replicaState=ReplicaState(replicaId=147, replicaEpoch=-1), maxWaitMs=0, minBytes=0, maxBytes=1048576, isolationLevel=0, sessionId=0, sessionEpoch=-1, topics=[FetchTopic(topic='metadata', topicId=AAAAAAAAAAAAAAAAAAAAAQ, partitions=[FetchPartition(partition=0, currentLeaderEpoch=2, fetchOffset=0, lastFetchedEpoch=0, logStartOffset=-1, partitionMaxBytes=0, replicaDirectoryId=XiEwxtuzSGuh5WQsWw8VnQ, highWatermark=-1)])], forgottenTopicsData=[], rackId=''), createdTimeMs=1782485465406, destination=localhost:10138 (id: 148 rack: null isFenced: false)) (org.apache.kafka.raft.KafkaRaftClient:2908)

The scenario is: after the local node becomes Follower, it is unable to successfully fetch from the leader, instead receiving the BROKER_NOT_AVAILABLE message, for the duration of its fetch timeout. This is shown by the local node sending a fetch to the leader, getting a BROKER_NOT_AVAILABLE response, and only then transitioning to Unattached.

jsancio · 2026-06-25T19:14:19Z

+        context.pollUntilRequest();
+        RaftRequest.Outbound fetchRequest = context.assertSentFetchRequest();
+        if (isBootstrapFetch) {
+            assertTrue(context.client.quorum().isUnattached());


Let's avoid testing internal kraft state. We should try to test and check kraft's externalities: RPCs, writes to the log, etc.

jsancio · 2026-06-25T19:15:39Z

+            local.id(),
+            local.directoryId().get()
+        )
+            .withStaticVoters(voters)


Why use static voters? Why not enable and use all of the latest features?

jsancio · 2026-06-25T19:19:04Z

+        context.deliverResponse(
+            leaderFetch.correlationId(),
+            leaderFetch.destination(),
+            RaftUtil.errorResponse(
+                ApiKeys.FETCH,
+                Errors.BROKER_NOT_AVAILABLE
+            )
+        );


According to the trace I pasted, this response is not deliver before the fetch timeout. This is misleading when reading the test.

jsancio · 2026-06-25T19:37:05Z

+        // The fetch timeout is much greater than the request manager's configured backoff, so the
+        // current unreachable connection will no longer be backing off when the next fetch is sent.
+        // Expire the fetch timeout and check that the next fetch is sent to the bootstrap server again.
+        context.time.sleep(context.fetchTimeoutMs + 1);


I made this changes and the test pass. It looks like the issue is that the leader is in the backoff state because kraft got an error from the leader:

diff --git a/raft/src/test/java/org/apache/kafka/raft/KafkaRaftClientFetchTest.java b/raft/src/test/java/org/apache/kafka/raft/KafkaRaftClientFetchTest.java index 8d762d6c96..3d873add30 100644 --- a/raft/src/test/java/org/apache/kafka/raft/KafkaRaftClientFetchTest.java +++ b/raft/src/test/java/org/apache/kafka/raft/KafkaRaftClientFetchTest.java @@ -836,7 +836,6 @@ public final class KafkaRaftClientFetchTest { // The fetch timeout is much greater than the request manager's configured backoff, so the // current unreachable connection will no longer be backing off when the next fetch is sent. // Expire the fetch timeout and check that the next fetch is sent to the bootstrap server again. - context.time.sleep(context.fetchTimeoutMs + 1); final var nextBootstrapFetch = pollAndCheckObserverFetchRequest( context, true, @@ -854,6 +853,8 @@ public final class KafkaRaftClientFetchTest { ) ); + context.time.sleep(context.retryBackoffMs); + // Discovering the leader from a bootstrap fetch means the observer resumes fetching from the leader pollAndCheckObserverFetchRequest( context, @@ -871,10 +872,8 @@ public final class KafkaRaftClientFetchTest { context.pollUntilRequest(); RaftRequest.Outbound fetchRequest = context.assertSentFetchRequest(); if (isBootstrapFetch) { - assertTrue(context.client.quorum().isUnattached()); assertTrue(fetchRequest.destination().id() < -1); } else { - assertTrue(context.client.quorum().isFollower()); assertEquals(expectedDestinationId, fetchRequest.destination().id()); } // only need to check port since the host is always "localhost" for the mock addresses

My test is written incorrectly. I agree with your above comments that the response from the leader fetch is not delivered until after the fetch timeout expires.

I need to handle the BROKER_NOT_AVAILABLE response first via poll() to accurately simulate this scenario.

kevin-wu24 · 2026-06-26T18:55:14Z

Looks like with the current implementation, there is a certain state + message delivery that can result in:

Caused by: java.lang.IllegalStateException: Expected to be Follower, but the current state is UnattachedState(epoch=1, leaderId=OptionalInt[3], votedKey=Optional.empty, voters=[0, 1, 2, 3, 4], electionTimeoutMs=9223372036854775807, highWatermark=Optional[LogOffsetMetadata(offset=12, metadata=Optional.empty)])
	at org.apache.kafka.raft.QuorumState.followerStateOrThrow(QuorumState.java:760)
	at org.apache.kafka.raft.KafkaRaftClient.handleFetchResponse(KafkaRaftClient.java:1742)
	at org.apache.kafka.raft.KafkaRaftClient.handleResponse(KafkaRaftClient.java:2741)
	at org.apache.kafka.raft.KafkaRaftClient.handleInboundMessage(KafkaRaftClient.java:2854)
	at org.apache.kafka.raft.KafkaRaftClient.lambda$poll$38(KafkaRaftClient.java:3701)
	at java.base/java.util.Optional.ifPresent(Optional.java:178)
	at org.apache.kafka.raft.KafkaRaftClient.poll(KafkaRaftClient.java:3701)
	at org.apache.kafka.raft.RaftEventSimulationTest$RaftNode.poll(RaftEventSimulationTest.java:971)

Where we aren't calling transitionToFollower in maybeTransition.

add unit test

8d99a03

github-actions Bot added triage PRs from the community kraft labels Apr 21, 2026

kevin-wu24 added 4 commits April 22, 2026 13:10

store state of previous fetch in FollowerState for observeres

a4ee904

observer should switch between bootstrap servers and leader when fetc…

48bb14b

…hing

remove dead code

325e11b

observer transitions to unattached after fetch timeout expires

9d0e220

github-actions Bot added the needs-attention label Apr 29, 2026

josefk31 reviewed May 11, 2026

View reviewed changes

github-actions Bot removed needs-attention triage PRs from the community labels May 12, 2026

josefk31 reviewed May 26, 2026

View reviewed changes

Merge branch 'trunk' into KAFKA-20514

0bce334

jsancio reviewed May 28, 2026

View reviewed changes

jsancio added the ci-approved label May 28, 2026

jsancio self-assigned this May 28, 2026

josefk31 reviewed Jun 4, 2026

View reviewed changes

clean up unit test

d20007b

josefk31 approved these changes Jun 15, 2026

View reviewed changes

ahuang98 reviewed Jun 15, 2026

View reviewed changes

jsancio reviewed Jun 16, 2026

View reviewed changes

kevin-wu24 added 4 commits June 16, 2026 17:05

code review

c5319ef

fix advanceTimeAndCompleteFetch

2bc3f54

remove special helper for transitioning back to follower

7ab818a

update QuorumState documentation

493e2e0

merge trunk

4c72c98

jsancio reviewed Jun 25, 2026

View reviewed changes

code review

4e91f4a

Uh oh!

Conversation

kevin-wu24 commented Apr 21, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

What changed

Testing

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

josefk31 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jsancio left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevin-wu24 May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevin-wu24 Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

josefk31 Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

josefk31 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

josefk31 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahuang98 Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevin-wu24 Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahuang98 left a comment

Choose a reason for hiding this comment

kevin-wu24 commented Apr 21, 2026 •

edited by github-actions Bot

Loading

kevin-wu24 May 28, 2026 •

edited

Loading

kevin-wu24 Jun 16, 2026 •

edited

Loading

josefk31 Jun 4, 2026 •

edited

Loading

ahuang98 Jun 15, 2026 •

edited

Loading

kevin-wu24 Jun 15, 2026 •

edited

Loading

jsancio Jun 16, 2026 •

edited

Loading

kevin-wu24 Jun 18, 2026 •

edited

Loading

kevin-wu24 Jun 26, 2026 •

edited

Loading