Multithreaded replication, parallel row-copy with DML merge, frontier filter, and heartbeat lag throttle by dnovitski · Pull Request #2 · dnovitski/gh-ost

dnovitski · 2026-04-29T08:32:28Z

Performance Optimizations for gh-ost

Note: This PR incorporates and supersedes the changes from #1 (multithreaded replication data inconsistency fix).

This PR adds several performance optimizations to gh-ost that significantly speed up row-copy under high write load while keeping binlog lag bounded.

Features

1. Parallel Row-Copy (inspired by feat concurrent chunk data #1398)

--copy-concurrency=N — parallel row-copy workers (default 1)
Bounded drain budget gives row-copy more execution turns instead of blocking indefinitely on DML drain

2. DML Event Merging (inspired by feat binlog apply optimization #1378)

Merges redundant DML events for the same row before applying
Under high write load, reduces applied statements by ~36%
Example: INSERT + UPDATE + UPDATE → single INSERT with final values
Disable with --skip-dml-merge

3. Frontier Filter (inspired by feat binlog apply optimization #1378)

Skips DML events for rows not yet copied (row-copy will capture latest value)
Reduces redundant work during the copy phase
Only active when --copy-concurrency=1 (single-copy): with parallel copy, multiple chunks are in-flight simultaneously so the frontier position is not a reliable boundary — in-flight chunks may not have committed yet, making it unsafe to skip events beyond the frontier
Automatically disabled in replica modes (TestOnReplica/MigrateOnReplica): in replica mode, binlog events are read from the replica's relay log and may be ahead of the SQL thread's apply position — row-copy SELECT queries may not yet see the data from skipped events, causing silent data loss
Disable with --skip-dml-frontier-filter

4. Heartbeat Lag Throttle

--copy-max-lag-millis (default 60000) prevents unbounded binlog lag growth during parallel row-copy
When HeartbeatLag exceeds threshold, pauses row-copy and drains exclusively
Resumes at threshold/2 (hysteresis prevents oscillation)
Set to 0 to disable (maximum copy speed, unbounded lag)
See documentation for detailed comparison with --max-lag-millis

Runtime-Changeable Flags

copy-concurrency=<N> — change parallel copy workers at runtime (range 1-32)
copy-max-lag-millis=<N> — change heartbeat lag threshold at runtime (0 = disable)
See interactive commands documentation for usage

Bug Fixes

Fixed buildDMLEventQuery DML mutation: UPDATE operations on unique-key tables no longer corrupt the shared DMLEvent object
Fixed frontier filter race in replica mode: Events read from binlog could be ahead of replica SQL thread position, causing missed changes
Fixed copy starvation with parallel row-copy: Unbuffered copyRowsQueue channel combined with HeartbeatLag sentinel value (before first heartbeat) caused copy to never get execution turns. Fixed with buffered channel and sentinel filtering

Benchmark Results (4-thread sysbench, 100K rows, 15-min runs)

Configuration	Copy Time	Max HeartbeatLag	DML Events/sec	Result
All features (no throttle)	23s	207s ⚠️	983	PASS
All features + lag throttle (60s)	41s	~55s ✅	~950	PASS
No DML merge	71s	262s	722	PASS
No frontier filter	28s	200s	970	PASS
Single-copy baseline	10m47s	6.6s	905	PASS

Key takeaways:

Parallel copy with throttle: 16x faster than single-copy baseline (41s vs 10m47s)
HeartbeatLag stays bounded at ~55s (vs 207s without throttle)
DML merge provides ~36% more events/sec throughput
All configurations pass data consistency checks (row counts, NULL PKs, duplicate PKs, checksums)

HeartbeatLag Analysis

Without the throttle, binlog lag grows unboundedly because the bounded drain (50ms budget) gives row-copy more turns at the expense of DML processing. The lag throttle resolves this:

During copy phase: lag may briefly reach threshold (~55-60s), then copy pauses
During throttle pause: exclusive DML drain brings lag back to ~30s (threshold/2)
After copy completes: DML catch-up drains remaining lag to 0 within minutes
At cutover: lag is always near 0 (normal gh-ost cutover behavior)

New CLI Flags

Flag	Default	Description
`--copy-max-lag-millis`	60000	Max heartbeat lag before throttling row-copy (0 = disabled)
`--skip-dml-merge`	false	Disable DML event merging (for benchmarking)
`--skip-dml-frontier-filter`	false	Disable frontier filter optimization (for benchmarking)

Testing

All existing integration tests pass (MySQL 5.7, 8.0, 8.4, Percona 8.0)
New integration test for DML event merging (merge-dml-events)
New integration test for parallel row-copy with lag throttle (parallel-rowcopy-lag-throttle)
Unit tests for runtime-changeable flag commands (12 test cases)
15-minute sysbench consistency tests under 4-thread concurrent write load
Data consistency validated: row counts, NULL PKs, unique PKs, checksums

…ttle (#2) Performance optimizations for gh-ost that significantly speed up row-copy under high write load while keeping binlog lag bounded: - Parallel row-copy with dedicated connection pool and time-bounded drain - DML event merging within batches (INSERT/DELETE cancellation, UPDATE folding) - Frontier filter to skip DML events beyond copy frontier - Heartbeat lag throttle (--copy-max-lag-millis) for row-copy pacing - Adaptive drain budget and auto-tuning chunk size - Runtime-changeable --copy-concurrency and --copy-max-lag-millis - Fix multithreaded replication data inconsistency Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…r, cut per-chunk round-trips The --chunk-concurrent-size parallel row-copy only ran the INSERTs in parallel; the boundary calculation and the per-chunk transaction overhead serialized work and capped the achievable speedup well below the hardware's parallel-insert ceiling. This addresses three of those caps. Prefetch range producer (overlap serialized boundary calc with INSERTs): - A single dedicated producer goroutine is the sole caller of CalculateNextIterationRangeEndValues and streams pre-computed ranges into a buffered channel, so boundary scans now overlap the parallel INSERTs of earlier work instead of stalling between batches. - Split iterateChunks into iterateChunksSingle (unchanged single-threaded semantics) and iterateChunksConcurrent. - Size the applier pool for concurrentSize + producer + headroom. #1 Per-chunk round-trips (applier.go): - ApplyIterationInsertQuery sent BEGIN / SET SESSION / INSERT / COMMIT as four round-trips per chunk. It now sends "SET SESSION ...; INSERT ..." as a single autocommit, multi-statement round-trip on one pinned connection. The applier pool already enables multiStatements + interpolateParams + autocommit; RowsAffected() reports the INSERT (last statement), and the optional SHOW WARNINGS runs on the same pinned connection. 4 round-trips -> 1. #2 Persistent worker pool (migrator.go): - Replace the per-batch errgroup+g.Wait barrier (which stalled N workers on the slowest chunk every N chunks) with continuous dispatch to an errgroup bounded by SetLimit(concurrentSize) for a 200ms time quantum. Workers stay saturated; the only barrier is at the quantum boundary. The time bound keeps executeWriteFuncs returning to apply binlog events and re-check throttling, preserving row-copy/event mutual exclusion. Checkpoints record the last contiguous completed range (not the producer's prefetched cursor), so resume restarts from fully-copied data. Benchmarked on MySQL 8.0.46 (innodb_autoinc_lock_mode=2), 2.1M rows: copy time vs the prior parallel impl improved up to 32% (chunk=200, conc=4: 22s->15s; chunk=1000, conc=8: 8s->6s). Data integrity verified by row count + checksum. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

dnovitski changed the title ~~perf: parallel row-copy with dedicated connection pool and time-bounded drain~~ perf: parallel row-copy, DML event merging, and adaptive drain Apr 29, 2026

dnovitski force-pushed the perf/parallel-rowcopy branch from 8b0acb3 to 9ab008d Compare April 29, 2026 09:25

dnovitski changed the title ~~perf: parallel row-copy, DML event merging, and adaptive drain~~ perf: Parallel row-copy with DML merge, frontier filter, and heartbeat lag throttle Apr 29, 2026

dnovitski force-pushed the perf/parallel-rowcopy branch 3 times, most recently from dd5dfd9 to 8a5b648 Compare April 29, 2026 20:12

dnovitski closed this Apr 29, 2026

dnovitski reopened this Apr 29, 2026

dnovitski force-pushed the perf/parallel-rowcopy branch from 8a5b648 to 3110d30 Compare April 29, 2026 20:22

dnovitski changed the base branch from mtr-squashed to master April 29, 2026 20:23

dnovitski force-pushed the perf/parallel-rowcopy branch from 3110d30 to ca7577a Compare April 29, 2026 20:35

dnovitski changed the title ~~perf: Parallel row-copy with DML merge, frontier filter, and heartbeat lag throttle~~ Multithreaded replication, parallel row-copy with DML merge, frontier filter, and heartbeat lag throttle Apr 29, 2026

dnovitski mentioned this pull request Apr 29, 2026

Multithreaded replication, parallel row-copy with DML merge, frontier filter, and heartbeat lag throttle github/gh-ost#1665

Open

dnovitski force-pushed the perf/parallel-rowcopy branch from ca7577a to a9ac404 Compare May 23, 2026 21:17

dnovitski force-pushed the perf/parallel-rowcopy branch from a9ac404 to 79e3ff8 Compare May 23, 2026 21:21

dnovitski force-pushed the perf/parallel-rowcopy branch from 79e3ff8 to 908d561 Compare May 23, 2026 21:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multithreaded replication, parallel row-copy with DML merge, frontier filter, and heartbeat lag throttle#2

Multithreaded replication, parallel row-copy with DML merge, frontier filter, and heartbeat lag throttle#2
dnovitski wants to merge 1 commit into
masterfrom
perf/parallel-rowcopy

dnovitski commented Apr 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dnovitski commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Optimizations for gh-ost

Features

1. Parallel Row-Copy (inspired by feat concurrent chunk data #1398)

2. DML Event Merging (inspired by feat binlog apply optimization #1378)

3. Frontier Filter (inspired by feat binlog apply optimization #1378)

4. Heartbeat Lag Throttle

Runtime-Changeable Flags

Bug Fixes

Benchmark Results (4-thread sysbench, 100K rows, 15-min runs)

HeartbeatLag Analysis

New CLI Flags

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dnovitski commented Apr 29, 2026 •

edited

Loading