Skip to content

Add adaptive batching dispatch metrics#5637

Draft
RitwijParmar wants to merge 1 commit into
bentoml:mainfrom
RitwijParmar:codex/adaptive-batching-dispatch-metrics
Draft

Add adaptive batching dispatch metrics#5637
RitwijParmar wants to merge 1 commit into
bentoml:mainfrom
RitwijParmar:codex/adaptive-batching-dispatch-metrics

Conversation

@RitwijParmar

Copy link
Copy Markdown

Summary

This adds dispatcher-level metrics for adaptive batching so operators can see why batches are released and how much queue wait they are paying.

Today BentoML exposes batch size, but it is hard to tell whether an endpoint is filling batches, waiting on the optimizer window, or sitting with low item counts under load. This PR adds a lightweight observer on CorkDispatcher and wires it into both Service and Runner serving paths.

New Prometheus signals include dispatch reason, queued jobs at release time, item count after payload-size aware splitting, and oldest request queue delay. The existing batch-size metric is kept and moved to the same dispatch observer so it is recorded from one place.

Validation

  • python3 -m ruff check src/bentoml/_internal/marshal/dispatcher.py src/bentoml/_internal/server/runner_app.py src/_bentoml_impl/server/app.py tests/unit/_internal/test_dispatcher_metrics.py
  • python3 -m compileall -q src/bentoml/_internal/marshal/dispatcher.py src/bentoml/_internal/server/runner_app.py src/_bentoml_impl/server/app.py tests/unit/_internal/test_dispatcher_metrics.py
  • git diff --check
  • /Users/ritwij/.cache/codex-runtimes/codex-primary-runtime/dependencies/python/bin/python3 -m pytest tests/unit/_internal/test_dispatcher_metrics.py -q

Signed-off-by: Ritwij Aryan Parmar <ritwij.aryan.parmar@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant