Skip to content

feat: add LiteLLM backend for multi-provider benchmarking#800

Open
RheagalFire wants to merge 1 commit into
vllm-project:mainfrom
RheagalFire:feat/add-litellm-provider
Open

feat: add LiteLLM backend for multi-provider benchmarking#800
RheagalFire wants to merge 1 commit into
vllm-project:mainfrom
RheagalFire:feat/add-litellm-provider

Conversation

@RheagalFire

@RheagalFire RheagalFire commented Jun 16, 2026

Copy link
Copy Markdown

Summary

Adds a new litellm backend that routes generation requests through the LiteLLM SDK, enabling benchmarking across 100+ providers (Anthropic, Gemini, Bedrock, Groq, Cohere, Mistral, etc.) via a unified interface. Timing instrumentation matches the existing OpenAI HTTP backend so benchmark results are directly comparable.

Details

  • New LiteLLMBackend and LiteLLMBackendArgs following the existing Backend / BackendArgs registration pattern
  • Uses litellm.acompletion(stream=True) with drop_params=True for cross-provider compatibility
  • Reuses ChatCompletionsRequestHandler.format() to build messages from GenerationRequest.columns
  • Lazy-loaded via guidellm.extras.litellm so the optional dep doesn't break imports when not installed
  • litellm>=1.80.0,<1.87.0 added as optional dependency under [project.optional-dependencies].litellm
  • 21 unit tests covering args, registration, lifecycle, streaming dispatch, timing, and token usage
  • All ruff checks pass
  • All 392 existing backend tests still pass

Test Plan

  • Run pytest tests/unit/backends/litellm/ -v to verify unit tests
  • Run pytest tests/unit/backends/ -v to verify no regressions in existing backends
  • Live E2E verified with anthropic/claude-sonnet-4-6 via Azure Foundry:
    args = LiteLLMBackendArgs(
        model="anthropic/claude-sonnet-4-6",
        api_key="...",
        api_base="...",
        max_tokens=50,
    )
    
    Confirmed: streaming works, TTFT signal fires, token counts captured, timing fields populated.

Related Issues

  • N/A (new feature)

  • "I certify that all code in this PR is my own, except as noted below."

Use of AI

  • Includes code generated or substantially modified by an AI agent
  • Includes tests generated or substantially modified by an AI agent

git log

commit 8e17f9b
Author: RheagalFire arishalam121@gmail.com
Date: Tue Jun 16 23:51:47 2026 +0530

feat: add LiteLLM backend for multi-provider benchmarking

Generated-by: Claude claude-opus-4-6
Signed-off-by: RheagalFire <arishalam121@gmail.com>

Generated-by: Claude claude-opus-4-6
Signed-off-by: RheagalFire arishalam121@gmail.com

@mergify

mergify Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Hi @RheagalFire, the DCO check has failed. Please click on DCO in the Checks section for instructions on how to resolve this.

@RheagalFire RheagalFire force-pushed the feat/add-litellm-provider branch 2 times, most recently from d0d1bda to 1b1135f Compare June 16, 2026 18:21
Generated-by: Claude claude-opus-4-6
Signed-off-by: RheagalFire <arishalam121@gmail.com>
@RheagalFire RheagalFire force-pushed the feat/add-litellm-provider branch from 1b1135f to 8e17f9b Compare June 16, 2026 18:22
@RheagalFire

Copy link
Copy Markdown
Author

cc @sjmonson

@sjmonson sjmonson left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick issue I noticed. Also I am guessing you didn't run with pre-check locally since at minimum you will need to regen the lock file after changing dependencies. You can do that with tox run -e lock and plain tox run will run all the other CI tasks if you want to have a faster feedback loop.

Comment thread pyproject.toml
recommended = ["guidellm[perf,tokenizers]"]
# Feature Extras
perf = ["orjson", "msgpack", "msgspec", "uvloop"]
litellm = ["litellm>=1.80.0,<1.87.0"]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to cap the version at 1.87.0 when the latest available is 1.89.*? Also the minimum should be no lower then 1.83 due to that supply chain attack from a while back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants