Skip to content

Fix experimental verify with norandommap#2110

Open
malikoyv wants to merge 3 commits into
axboe:masterfrom
malikoyv:pr-track-experimental-verify-null-blocks
Open

Fix experimental verify with norandommap#2110
malikoyv wants to merge 3 commits into
axboe:masterfrom
malikoyv:pr-track-experimental-verify-null-blocks

Conversation

@malikoyv

Copy link
Copy Markdown

This PR fixes false verify failures when experimental_verify is used with norandommap.

We need this path to keep RAM usage low on very large devices. Standard verify stores per-write metadata in memory, which can become huge with large random-write workloads. We tried other approaches, including memory-backed storage, but performance was poor and fio runs took too long.

With norandommap, fio can overwrite the same offset multiple times and leave other offsets untouched. experimental_verify replays the workload instead of storing full write history, so it could replay and verify offsets that were never written. That caused false bad magic header 0 errors.

This patch adds a lightweight per-file bitmap to track blocks that were actually written. During replay, fio skips blocks that were never written.

Yehor Malikov added 3 commits June 17, 2026 22:39
experimental_verify replays the workload instead of keeping fio's normal per-write io_piece history. With norandommap, random writes may overwrite the same offset multiple times and leave other offsets untouched. During verify replay, fio can then read an offset that was never written and report a false "bad magic header 0" failure.

Add a per-file bitmap for experimental_verify + norandommap to record which blocks were actually written. During replay, skip offsets that were never written.

Mark bits only after writes are accepted for queueing or completion so serialize_overlap/FIO_Q_BUSY requeues do not create false written entries.

Signed-off-by: Yehor Malikov <Yehor.Malikov@solidigm.com>
A verify_only run starts in a fresh fio process, so it cannot reuse the bitmap built during the original write run. For experimental_verify with norandommap, rebuild the bitmap during the dry-run pass by replaying the write workload without issuing writes.

Reset replay-sensitive counters and file state before the verify read pass so experimental_verify replays the same offset sequence again and checks only blocks that were actually written.

Signed-off-by: Yehor Malikov <Yehor.Malikov@solidigm.com>
Enable verify_only coverage for experimental_verify now that replay state is reset correctly, and add an experimental_verify case to the verify header test matrix.

Also disable write sequence checking for experimental_verify overlap-risk workloads unless explicitly requested, since duplicate overwrites cannot be sequence-verified without the full write history.

Signed-off-by: Yehor Malikov <Yehor.Malikov@solidigm.com>
@malikoyv malikoyv force-pushed the pr-track-experimental-verify-null-blocks branch from 1b4d507 to 4b593bb Compare June 18, 2026 06:48
@malikoyv

Copy link
Copy Markdown
Author

Maybe a bit off-topic, but does anyone have experience with or a solution for the high RAM usage of FIO verification when norandommap is enabled? Has anyone found an approach that provides performance close to the default in-memory behavior while significantly reducing memory consumption?

We also have an additional disk that could be used as a swap device, but we found that the resulting performance is quite poor. I will be very thankful for the suggestions and ideas!

@sitsofe

sitsofe commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

(

but does anyone have experience with or a solution for the high RAM usage of FIO verification when norandommap is enabled?

Does experimental_verify help (but note this comes with its own drawbacks and limitations)?

Something tells me you've already tried :-)
)

@malikoyv

Copy link
Copy Markdown
Author

(

but does anyone have experience with or a solution for the high RAM usage of FIO verification when norandommap is enabled?

Does experimental_verify help (but note this comes with its own drawbacks and limitations)?

Something tells me you've already tried :-) )

Exactly :-)
experimental_verify hadn't failed even when we wrote a simple test, to be sure experimental_verify works, that corrupted 4KiB region just writing zeroes there and this feature hadn't detected it unfortunately, that's what my patch fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants