-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Add support for custom election tick #9725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 4 commits
377228d
40ac0d3
ee4d6ff
f26e195
6542e18
7071d3b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -79,7 +79,20 @@ type Node struct { | |
| } | ||
|
|
||
| // NewNode returns a new Node instance. | ||
| func NewNode(rc *pb.RaftContext, store *raftwal.DiskStorage, tlsConfig *tls.Config) *Node { | ||
| // electionTick controls how many ticks (each 100ms) before an election is triggered. | ||
| // If electionTick <= 0, defaults to 20 (i.e., 2s election timeout). | ||
| func NewNode(rc *pb.RaftContext, store *raftwal.DiskStorage, tlsConfig *tls.Config, | ||
| electionTick int) *Node { | ||
|
|
||
| const heartbeatTick = 1 // 100ms per tick | ||
| if electionTick <= 0 { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The current guard only handles electionTick <= 0. But HeartbeatTick is hardcoded to 1 just below, and etcd raft requires ElectionTick > HeartbeatTick. If an operator sets election-tick=1, Config.validate() Since this is the only place a raft.Config is built (every Alpha node and Zero flow through NewNode), the heartbeat floor of 1 applies everywhere, so election-tick can never legally be 1. Better to fail fast with
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch, this has been addressed in the latest commit. Added validation at lines 91–94: const heartbeatTick = 1 // 100ms per tick
if electionTick <= 0 {
electionTick = 20
}
if electionTick <= heartbeatTick {
glog.Fatalf(
"election-tick=%d is invalid: must be greater than heartbeat-tick (%d). "+
"Recommended minimum is 10 (1s election timeout).",
electionTick,
heartbeatTick,
)
}Now, if a user sets: --raft "election-tick=1"the process fails fast during startup with a clear validation error instead of encountering a cryptic Raft panic later during initialization. Thanks for reviewing, let me know if any changes/explaination req.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On second thought, do you think it would make sense to add a stricter fail-safe and reject election tick values below 10 altogether? Since the recommendation is already 10 (1s election timeout), allowing smaller values may not be particularly useful and could lead to unstable configurations. Similarly, would it be worth introducing an upper bound as well (for example, around 24 hours) to prevent accidentally misconfigured values resulting in extremely long election timeouts? I'd be interested in your thoughts on both of these. Do you prefer keeping the validation minimal (only ensuring it's greater than the heartbeat tick), or would you favor enforcing a practical operating range for election tick values?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi @LakshimiRam-073 , Thanks for adding the validation. The fail-fast check on For the two follow-up questions, I'd personally keep the validation fairly minimal and avoid enforcing a practical range. For the lower bound of 10, I don't think we should reject smaller values. One of the main reasons for exposing this flag is to give operators control over election timing, and there are valid use cases for running below 10. For example, integration/CI tests often benefit from much faster failover, single-node development setups don't have the same stability concerns, and some low-latency deployments may intentionally optimize for sub-second failover. etcd itself only requires I'd also avoid adding an upper bound. Any limit we pick—24 hours or otherwise is ultimately arbitrary. If someone configures an extremely large election timeout, they've effectively decided they want leader changes to be very rare or even handled manually. That's unusual, but it could still be intentional. Hard limits in cases like this often end up causing more frustration than value. That said, I don't love silently accepting very small values either. Since raft randomizes the timeout in Instead of rejecting those values, I'd lean toward logging a warning: if electionTick < 10 {
glog.Warningf("election-tick=%d gives a %dms election timeout. Values below 10 (1s) "+
"may cause spurious leader elections under GC pauses or network jitter.",
electionTick, electionTick*100)
}That way we preserve flexibility while still making operators aware of the tradeoffs.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated in commit I kept the hard correctness check on |
||
| electionTick = 20 | ||
| } | ||
| if electionTick <= heartbeatTick { | ||
| glog.Fatalf("election-tick=%d is invalid: must be greater than heartbeat-tick (%d). "+ | ||
| "Recommended minimum is 10 (1s election timeout).", electionTick, heartbeatTick) | ||
| } | ||
|
|
||
| snap, err := store.Snapshot() | ||
| x.Check(err) | ||
|
|
||
|
|
@@ -90,8 +103,8 @@ func NewNode(rc *pb.RaftContext, store *raftwal.DiskStorage, tlsConfig *tls.Conf | |
| Store: store, | ||
| Cfg: &raft.Config{ | ||
| ID: rc.Id, | ||
| ElectionTick: 20, // 2s if we call Tick() every 100 ms. | ||
| HeartbeatTick: 1, // 100ms if we call Tick() every 100 ms. | ||
| ElectionTick: electionTick, // Default 2s if tick is 100ms. | ||
| HeartbeatTick: heartbeatTick, // 100ms if we call Tick() every 100 ms. | ||
| Storage: store, | ||
| MaxInflightMsgs: 256, | ||
| MaxSizePerMsg: 256 << 10, // 256 KB should allow more batching. | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A change to this file is scope-creep. Please revert, open a different PR with rationale for this change if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted. Will open a separate PR with rationale if we want to pursue that change.