Skip to content

Rebuild upsert metadata from validDocIds snapshot on reload for Upsert with TTL tables#18860

Merged
Jackie-Jiang merged 1 commit into
apache:masterfrom
deepthi912:block-force-download-ttl-upsert-dedup
Jun 29, 2026
Merged

Rebuild upsert metadata from validDocIds snapshot on reload for Upsert with TTL tables#18860
Jackie-Jiang merged 1 commit into
apache:masterfrom
deepthi912:block-force-download-ttl-upsert-dedup

Conversation

@deepthi912

@deepthi912 deepthi912 commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Summary

Reloading a segment of an upsert table with metadata TTL (metadataTTL or deletedKeysTTL) previously rescanned every doc and re-added every primary key, which resurrected keys the TTL had already expired or deleted — producing incorrect query results.

With snapshots enabled, segment reload now rebuilds the upsert metadata from the persisted validDocIds snapshot (valid docs only).

Behavior (upsert + metadata TTL, snapshots enabled)

Reload snapshot present no snapshot
normal CRC match → rebuild from snapshot; CRC mismatch → fail fail (use forceDownload)
forceDownload rebuild from snapshot (CRC match); CRC mismatch → fail full scan

Segment commit and refresh paths are unchanged — only the reload path places a snapshot on the incoming segment, so they keep the regular full-scan behavior.

Notes

  • Dedup tables are out of scope (no validDocIds snapshot).

Backward-incompatible

A normal & forceDownload)reload of an upsert table with metadata TTL and snapshots enabled now fails when the segment CRC has changed; use a forceDownload reload in case snapshots don't exist on server.

@deepthi912 deepthi912 requested a review from Jackie-Jiang June 26, 2026 07:15
@codecov-commenter

codecov-commenter commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 51.72414% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.80%. Comparing base (a9b5207) to head (f976d47).
⚠️ Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
.../pinot/core/data/manager/BaseTableDataManager.java 23.52% 8 Missing and 5 partials ⚠️
...cal/upsert/BasePartitionUpsertMetadataManager.java 91.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             master   #18860   +/-   ##
=========================================
  Coverage     64.79%   64.80%           
  Complexity     1322     1322           
=========================================
  Files          3393     3393           
  Lines        211265   211332   +67     
  Branches      33212    33234   +22     
=========================================
+ Hits         136896   136959   +63     
+ Misses        63320    63309   -11     
- Partials      11049    11064   +15     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (?)
java-21 64.80% <51.72%> (+<0.01%) ⬆️
temurin 64.80% <51.72%> (+<0.01%) ⬆️
unittests 64.80% <51.72%> (+<0.01%) ⬆️
unittests1 57.00% <13.79%> (-0.02%) ⬇️
unittests2 37.19% <37.93%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@deepthi912 deepthi912 added upsert Related to upsert functionality dedup Changes related to realtime ingestion dedup handling bug Something is not working as expected labels Jun 26, 2026
@deepthi912 deepthi912 marked this pull request as draft June 26, 2026 17:24
@deepthi912 deepthi912 force-pushed the block-force-download-ttl-upsert-dedup branch from c46bf01 to 8154665 Compare June 27, 2026 07:25
@deepthi912 deepthi912 added the backward-incompat Introduces a backward-incompatible API or behavior change label Jun 27, 2026
@deepthi912 deepthi912 changed the title Donot allow force-download reload for upsert/dedup tables with when TTL is set Rebuild upsert metadata from validDocIds snapshot on reload for TTL tables Jun 27, 2026
@deepthi912 deepthi912 changed the title Rebuild upsert metadata from validDocIds snapshot on reload for TTL tables Rebuild upsert metadata from validDocIds snapshot on reload for Upsert with TTL tables Jun 27, 2026
@deepthi912 deepthi912 force-pushed the block-force-download-ttl-upsert-dedup branch 2 times, most recently from b1506e3 to 129b2a9 Compare June 28, 2026 07:56
@deepthi912 deepthi912 marked this pull request as ready for review June 28, 2026 07:57

@xiangfu0 xiangfu0 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found two high-signal reload-path issues; see inline comments.

@deepthi912 deepthi912 force-pushed the block-force-download-ttl-upsert-dedup branch 2 times, most recently from 90c5c94 to 0f45eaa Compare June 29, 2026 00:04
…ables

Reloading a segment of an upsert table with metadata TTL (metadataTTL or
deletedKeysTTL) previously rescanned every doc and re-added every primary key,
which resurrected keys the TTL had already expired or deleted, producing
incorrect query results.

With snapshots enabled, segment reload now rebuilds the upsert metadata from the
persisted validDocIds snapshot (valid docs only):
- normal reload: proceeds only when the segment CRC matches and a snapshot
  exists; otherwise it fails with an error pointing the operator to forceDownload.
- forceDownload reload: always proceeds, using the snapshot if present (the
  operator accepts a possible CRC mismatch), otherwise a full scan.

Segment commit and refresh paths are unchanged: only the reload path places a
snapshot on the incoming segment, so they keep the regular full-scan behavior.

This supersedes the earlier approach of blocking forceDownload reload at the
controller for upsert/dedup TTL tables; that API-level check is removed.

Backward-incompatible: a normal (non-forceDownload) reload of an upsert table
with metadata TTL and snapshots enabled now fails when the segment CRC has
changed or no validDocIds snapshot exists; use a forceDownload reload in that
case.
@deepthi912 deepthi912 force-pushed the block-force-download-ttl-upsert-dedup branch from 0f45eaa to f976d47 Compare June 29, 2026 00:28
@Jackie-Jiang Jackie-Jiang merged commit cd6dd61 into apache:master Jun 29, 2026
11 checks passed
@xiangfu0

Copy link
Copy Markdown
Contributor

Opened the pinot-docs follow-up PR for this change: pinot-contrib/pinot-docs#893

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backward-incompat Introduces a backward-incompatible API or behavior change bug Something is not working as expected dedup Changes related to realtime ingestion dedup handling upsert Related to upsert functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants