Enforce validDocIds consensus in upsert task generators and add includeBitmaps to validDocIdsMetadata API#18853
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18853 +/- ##
============================================
- Coverage 64.81% 64.79% -0.02%
- Complexity 1322 1346 +24
============================================
Files 3393 3393
Lines 211246 211401 +155
Branches 33208 33250 +42
============================================
+ Hits 136917 136980 +63
- Misses 63284 63362 +78
- Partials 11045 11059 +14
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
ef91178 to
8571ed6
Compare
Mirror the executor's validDocIds enforcement at generation time so inconsistent segments are never scheduled. UpsertCompactionTaskGenerator and UpsertCompactMergeTaskGenerator now validate each segment's replicas (CRC match, server health, and EQUAL/UNSAFE/MOST_VALID_DOCS consensus) via the shared MinionTaskUtils.selectValidDocIdsMetadataForConsensus, requesting includeBitmaps only for EQUAL and requiring all assigned replicas to respond for the strict modes. A new validDocIdsValidationMode config (STRICT default, EXECUTOR_ONLY) gates the generator-side checks: STRICT runs them in both generator and executor; EXECUTOR_ONLY downgrades the generator to a lenient pick and leaves the executor as the sole gate.
8571ed6 to
cac3f5e
Compare
xiangfu0
left a comment
There was a problem hiding this comment.
Found one high-signal issue; see inline comment.
| if (consensusMode == MinionConstants.ValidDocIdsConsensusMode.EQUAL) { | ||
| ValidDocIdsMetadataInfo first = usableReplicas.get(0); | ||
| RoaringBitmap consensusBitmap = deserializeBitmapOrNull(first); | ||
| if (consensusBitmap == null) { |
There was a problem hiding this comment.
This makes controller-first rolling upgrades incompatible under the new default STRICT/EQUAL path. Older servers still answer this endpoint but omit bitmap, and ValidDocIdsMetadataInfo explicitly treats that as expected for old servers; here we convert that mixed-version response into a hard skip, so upsert compaction/compact-merge task generation stops until every server is upgraded. Please keep the generator default executor-only, or add an old-server fallback before making bitmap-based prescheduling the default.
Summary
Make the upsert task generators reject segments with inconsistent replicas before scheduling, instead of relying on the executor to fail those tasks after the fact. Concretely, this PR:
includeBitmapsflag to thevalidDocIdsMetadataendpoint so the controller can batch-fetch each replica's validDocIds bitmap in one call.EQUAL/UNSAFE/MOST_VALID_DOCS) inUpsertCompactionTaskGeneratorandUpsertCompactMergeTaskGenerator, skipping any segment whose replicas disagree.validDocIdsValidationModeconfig (STRICTdefault /EXECUTOR_ONLY) to toggle the generator-side checks.Follow-up to #17696 (which added the executor-side consensus), per this review comment.
Background
#17696 made the upsert compaction executor reconcile validDocIds across replicas before compacting — it checks CRC, server health, and a configurable
validDocIdsConsensusMode(UNSAFE/EQUAL/MOST_VALID_DOCS), failing the task rather than letting a less-complete replica overwrite a more-complete one.The generator, however, still scheduled a task for every eligible segment, even when its replicas were inconsistent (a CRC mismatch mid-reload, an unhealthy server, or divergent validDocIds). The executor then has to fail those tasks to stay safe — which wastes a segment download and a task slot every cycle, and the same segment keeps getting re-picked and re-failed.
What this PR does
includeBitmapsoption on thevalidDocIdsMetadataendpoint.POST /tables/{tableNameWithType}/validDocIdsMetadata?includeBitmaps=truereturns the serialized validDocIds bitmap in each per-segment-replica entry, so the controller can compare replica bitmaps in one batched call instead of a request per segment-replica. Off by default; response unchanged when not requested (@JsonInclude(NON_NULL)), backward compatible.Generator-side consensus in
UpsertCompactionTaskGeneratorandUpsertCompactMergeTaskGenerator— the same CRC / server-health /validDocIdsConsensusModechecks the executor uses, skipping any segment whose replicas disagree. Bitmaps are fetched only forEQUAL; strict modes also require responses from all assigned replicas.New
validDocIdsValidationModeconfig:STRICT(default) enforces in both generator and executor;EXECUTOR_ONLYkeeps the prior executor-only behavior.Backward compatibility
validDocIdsValidationModedefaults toSTRICT, so generators now enforce consensus by default; setEXECUTOR_ONLYto restore the old behavior. The endpoint change is additive/opt-in.Testing
Unit tests cover each mode through the generators —
EQUAL(agree / disagree / replica missing / CRC mismatch / unhealthy / missing bitmap),UNSAFE,MOST_VALID_DOCS— plus the config parsing/resolution helpers.