Skip to content

Google Cloud Pub/Sub gRPC: eager-pull stages, subscribe field-leak fi…#1621

Draft
hanishi wants to merge 1 commit into
apache:mainfrom
hanishi:pubsub-grpc-eager-pull-and-subscriber
Draft

Google Cloud Pub/Sub gRPC: eager-pull stages, subscribe field-leak fi…#1621
hanishi wants to merge 1 commit into
apache:mainfrom
hanishi:pubsub-grpc-eager-pull-and-subscriber

Conversation

@hanishi

@hanishi hanishi commented May 11, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes three correctness gaps in the google-cloud-pub-sub-grpc subscriber pipeline (two introduced by #1494, one latent since 2019 and surfaced by it), and adds a high-level Subscriber resource so users get a correct
subscriber by construction.

Closes #1620

What this changes

  • EagerPullTrackingStage replaces the demand-driven .map tracker inside autoExtendAckDeadlines. Tracking now fires on receipt from the gRPC adapter, so messages buffered behind a slow mapAsync no longer hit their
    ack deadline before being tracked. (Bug 1)
  • StreamingPullRequest.defaultInstance for subsequent keepalive requests in subscribe(...). The Pub/Sub server forbids clientId, maxOutstandingMessages, and maxOutstandingBytes on subsequent requests; the
    previous code only cleared subscription and streamAckDeadlineSeconds, so any caller setting maxOutstandingMessages got INVALID_ARGUMENT ~1s after stream start. (Bug 2, latent since 3f1a4f7f28 from Jan 2019)
  • FlowControlGateStage rewritten with the same eager-pull pattern as bug 1. Permits acquired on receipt rather than on push downstream, so outstandingCount reflects actual in-flight delivery and the gate's limit
    finally bounds what the server sends. Internal ArrayDeque pre-sized to maxOutstandingMessages (clamped to 65536). (Bug 3)
  • AckDeadlineExtender is a new caller-owned class holding the tracking map and background ticker. Both survive RestartSource.withBackoff re-materializations, matching MessageDispatcher's lifecycle in Google's
    official client.
  • GooglePubSub.subscriber(...) is the new high-level entry point. Returns a Subscriber resource that bundles subscribe + restart + autoExtendAckDeadlines(extender) + optional flowControlGate. Composition is
    correct by construction: restart wraps only the inner subscribe, the extender persists across reconnects, acknowledge/nack sinks release flow-control permits and record AckDeadlineDistribution completion latencies
    automatically when configured.
  • Java DSL inherits all three bug fixes through the existing factories. A Java DSL Subscriber wrapper is deferred (Java/Scala generated client split needs its own design pass; Java users can still compose the operators
    directly or use AckDeadlineExtender.create(...)).
  • k8s/build-and-push.sh hardened with an EXIT trap so script failures no longer leak GkeAuthTest.scala / GkeFullFeatureTest.scala into the source tree.
  • Scenario 8 added to GkeFullFeatureTest exercising the new Subscriber resource end-to-end against real Pub/Sub via Workload Identity. Asset committed but not run this PR cycle.

Verification

  • 14 unit tests (AutoExtendAckDeadlinesSpec + SubscriberSpec) pass against stub SubscriberClients. No live infrastructure needed.
  • 18/18 IntegrationSpec tests pass against the Pub/Sub emulator (docker compose up gcloud-pubsub-emulator), including the new Subscriber resource scenario. No regressions in the 17 pre-existing tests.
  • The new Subscriber resource scenario also passed against real GCP (project pekko-connectors) via gcloud auth application-default login. 35s; recorded in google-cloud-pub-sub-grpc/README.md test report table.

Notes for reviewers

  • Subscriber is intended as the primary entry point going forward. The low-level operators (subscribe, autoExtendAckDeadlines, flowControlGate, etc.) stay available for callers who genuinely need a different composition
    (sharing one extender across multiple subscriptions, splitting the source for fan-out, etc.).
  • FlowControl.outstandingCount semantics changed: permits are now acquired on receipt, not on push downstream. Telemetry built on this counter will report higher numbers for the same workload, but those numbers are now
    accurate. Doc updated.
  • AckDeadline, AckDeadlineExtender, and Subscriber are @ApiMayChange.
  • The internal-ticker autoExtendAckDeadlines(subscription, ...) overloads stay intact for back-compat, with a doc note that they're not restart-safe and pointing at the AckDeadlineExtender form for that case.

…x, and Subscriber resource

* Bug 1: autoExtendAckDeadlines now tracks on receipt via EagerPullTrackingStage,
  not on push downstream. Prevents deadline expiry on messages buffered behind
  saturated mapAsync.
* Bug 2: subsequent StreamingPullRequests built from defaultInstance, clearing
  clientId / maxOutstandingMessages / maxOutstandingBytes that the server rejects
  on non-initial requests.
* Bug 3: FlowControlGateStage rewritten with the eager-pull pattern so permits
  are acquired on receipt; outstandingCount now reflects actual delivery.
  Pre-sized internal ArrayDeque to FlowControl.maxOutstandingMessages (clamped
  to 65536) to skip resize doublings during initial fill.
* AckDeadlineExtender: caller-owned tracking map and ticker that survive
  RestartSource re-materializations (matches Google MessageDispatcher lifecycle).
* Subscriber resource (GooglePubSub.subscriber): bundles subscribe + restart +
  autoExtend + optional flowControl with correct composition by construction.
* ExampleApp.subscribeAutoExtend uses the new Subscriber API.
@hanishi hanishi force-pushed the pubsub-grpc-eager-pull-and-subscriber branch from 5929a64 to 6139a72 Compare May 18, 2026 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

google-cloud-pub-sub-grpc: subscriber correctness follow-up to #1494

1 participant