Skip to content

Resolve Parquet shard count via bucket index to optimize storage calls#7648

Open
SungJin1212 wants to merge 1 commit into
cortexproject:masterfrom
SungJin1212:parquet-shard-count-from-bucket-index
Open

Resolve Parquet shard count via bucket index to optimize storage calls#7648
SungJin1212 wants to merge 1 commit into
cortexproject:masterfrom
SungJin1212:parquet-shard-count-from-bucket-index

Conversation

@SungJin1212

Copy link
Copy Markdown
Member

What this PR does:
This PR updates the Parquet shard resolution logic to utilize the bucket index, reducing the number of object storage calls.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
  • docs/configuration/v1-guarantees.md updated if this PR introduces experimental flags

shardCounts := make(map[string]int, len(blockIDs))

if p.bucketIndexEnabled {
idx, err := bucketindex.ReadIndex(ctx, p.indexBucket, p.userID, p.limits, p.logger)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be cached with some TTL instead? Or we would rather have a separate goroutine to sync bucket index periodically rather than resolving it at query time.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use bucketindex.Loader it has built-in caching

…e calls

Signed-off-by: SungJin1212 <tjdwls1201@gmail.com>
@SungJin1212 SungJin1212 force-pushed the parquet-shard-count-from-bucket-index branch from 0fdab66 to 641b69d Compare June 29, 2026 00:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants