Add tensorrt_multi_device_inference operator (multi-GPU TensorRT Multi-Device) by pkisfaludi-nv · Pull Request #1631 · nvidia-holoscan/holohub

pkisfaludi-nv · 2026-06-26T05:35:27Z

Summary

Adds a self-contained tensorrt_multi_device_inference operator (+ a minimal sample app) that runs a single TensorRT engine sharded across ≥2 GPUs via TensorRT Multi-Device (NCCL DistCollective + IExecutionContext::setCommunicator), so one operator drives N GPUs — for models too large for one GPU or that benefit from tensor parallelism. (TRT-28040)

It wraps a hardware-validated MultiDeviceTrt core: ncclCommInitAll → per-rank deserialize → concurrent setCommunicator → host-bounce input replication → fan-out enqueueV3 → rank-0 output. It does not depend on the SDK's HoloInfer/InferenceOp.

What's added

operators/tensorrt_multi_device_inference/ — operator (TensorRtMultiDeviceInferenceOp) + the reused MD core + metadata.json + CMakeLists.txt + README.md; registered in operators/CMakeLists.txt.
applications/multi_device_inference/ — a minimal source → MD inference → checksum sink C++ demo + config + metadata + README; registered in applications/CMakeLists.txt.

Requirements (please review)

TensorRT ≥ 11.0 (Multi-Device is GA in TensorRT 11) and NCCL — these are not in the stock Holoscan/HoloHub container (TRT 10). Open question for maintainers: how would you prefer to provide a TRT-11 + NCCL build environment for this operator (a dedicated Dockerfile stage, a CI image, or gating)? I did not modify the shared Dockerfile per AGENTS.md.
≥ 2 homogeneous GPUs (SM80+); engine(s) sharded offline.

Validation status

Draft. The Multi-Device runtime core (multidevice.cpp) is validated on 2× NVIDIA B200 (TensorRT 11.1): a tensor-parallel MLP sharded across 2 GPUs matched the 1-GPU reference (max_rel 1.24e-05).
The HoloHub operator/app wrapper has not been built in CI yet (needs the TRT-11 + NCCL container above) — clang-format and metadata structure pass locally. Marking draft until the build environment question is resolved and CI is green.

DCO signed-off. Companion Holoscan-SDK MR (HoloInfer-internal variant): TRT-28040 / holoscan-sdk!4577.

🤖 Generated with Claude Code

A self-contained HoloHub operator that runs a single TensorRT engine sharded across >=2 GPUs via TensorRT Multi-Device (NCCL DistCollective + setCommunicator), so one operator drives N GPUs (TRT-28040). Wraps the hardware-validated MultiDeviceTrt core (ncclCommInitAll -> per-rank deserialize -> concurrent setCommunicator -> host-bounce input replication -> fan-out enqueueV3). - operators/tensorrt_multi_device_inference/: operator + reused MD core + metadata.json + CMakeLists + README; registered in operators/CMakeLists.txt. - applications/multi_device_inference/: minimal source -> MD inference -> checksum sink demo (cpp), config, metadata, README; registered in applications/CMakeLists.txt. Requires TensorRT >= 11 (Multi-Device GA), NCCL, and >= 2 homogeneous GPUs (SM80+). MD core validated on 2x B200 (TensorRT 11.1): TP-sharded MLP across 2 GPUs vs the 1-GPU reference, max_rel 1.24e-05. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Peter Kisfaludi <pkisfaludi@nvidia.com>

for more information, see https://pre-commit.ci

github-project-automation Bot added this to Holohub Jun 26, 2026

[pre-commit.ci] auto fixes from pre-commit.com hooks

daf086f

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add tensorrt_multi_device_inference operator (multi-GPU TensorRT Multi-Device)#1631

Add tensorrt_multi_device_inference operator (multi-GPU TensorRT Multi-Device)#1631
pkisfaludi-nv wants to merge 2 commits into
nvidia-holoscan:mainfrom
pkisfaludi-nv:feature/tensorrt-multi-device-inference

pkisfaludi-nv commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

pkisfaludi-nv commented Jun 26, 2026

Summary

What's added

Requirements (please review)

Validation status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant