fix(benchmark): raise RLIMIT_NOFILE in benchmark_serving for high concurrency by ZhangLirong-amd · Pull Request #1394 · ROCm/ATOM

ZhangLirong-amd · 2026-06-29T06:00:17Z

Summary

At high --max-concurrency, each in-flight request holds a socket fd. The default soft RLIMIT_NOFILE (~1024) is exhausted client-side (EMFILE on socket()), so most requests fail before ever reaching the server. The benchmark then reports only ~one concurrency-wave of successes (e.g. ~919/10240 at --max-concurrency=1024) while the server logs 200 OK for every request it actually receives.

The server already calls set_ulimit() at startup; this makes the benchmark client do the same (raise soft toward 65535, capped at the hard limit) before opening any connections.

This is purely a client-side fd-limit fix — no change to request logic.

Root cause (for context)

Containers launched without --ulimit nofile inherit containerd's soft limit (1024 on hosts where systemd DefaultLimitNOFILESoft=1024). After a containerd 1.7→2.x upgrade, plainly-launched containers stopped getting a high default, surfacing this on the benchmark client.

Test plan

python -m atom.benchmarks.benchmark_serving --dataset-name random --random-input-len 1024 --random-output-len 1024 --num-prompts 10240 --max-concurrency 1024 --request-rate inf --ignore-eos -> all 10240 succeed (previously ~919).
Confirm set_ulimit() runs before the first request.

Copilot

Pull request overview

This PR aims to make the benchmark client resilient at high --max-concurrency by raising the process RLIMIT_NOFILE soft limit before opening many simultaneous sockets, preventing client-side EMFILE failures that drop requests.

Changes:

Raise RLIMIT_NOFILE at the start of benchmark_serving.py:main() via set_ulimit().
Add explanatory comments describing why the benchmark client needs the ulimit bump (mirroring server behavior).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    # Raise the open-file soft limit before opening any connections. At high
+    # --max-concurrency each in-flight request is a socket (fd); the default
+    # RLIMIT_NOFILE soft (~1024) is exhausted client-side (EMFILE on socket()),
+    # silently dropping requests so most never reach the server. The server
+    # already calls set_ulimit() at startup; the client must too.
+    from atom.utils import set_ulimit
+
+    set_ulimit()


…currency At high --max-concurrency each in-flight request holds a socket fd. The default soft RLIMIT_NOFILE (~1024) is exhausted client-side (EMFILE on socket()), so most requests fail before reaching the server and the run reports only ~one concurrency-wave of successes (e.g. ~919/10240 at conc=1024) while the server logs 200 OK for every request it actually receives. The server already calls set_ulimit() at startup; call it in the benchmark client too (soft is raised toward 65535, capped at the hard limit).

Copilot AI review requested due to automatic review settings June 29, 2026 06:00

Copilot started reviewing on behalf of ZhangLirong-amd June 29, 2026 06:00 View session

Copilot AI reviewed Jun 29, 2026

View reviewed changes

ZhangLirong-amd force-pushed the zlr/benchmark-set-ulimit branch from ae94fc8 to 8ec57c2 Compare June 29, 2026 06:02

valarLip approved these changes Jun 29, 2026

View reviewed changes

valarLip merged commit f797dd5 into main Jun 29, 2026
27 of 35 checks passed

valarLip deleted the zlr/benchmark-set-ulimit branch June 29, 2026 10:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(benchmark): raise RLIMIT_NOFILE in benchmark_serving for high concurrency#1394

fix(benchmark): raise RLIMIT_NOFILE in benchmark_serving for high concurrency#1394
valarLip merged 1 commit into
mainfrom
zlr/benchmark-set-ulimit

ZhangLirong-amd commented Jun 29, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

ZhangLirong-amd commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause (for context)

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ZhangLirong-amd commented Jun 29, 2026 •

edited

Loading