fix(benchmark): raise RLIMIT_NOFILE in benchmark_serving for high concurrency#1394
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR aims to make the benchmark client resilient at high --max-concurrency by raising the process RLIMIT_NOFILE soft limit before opening many simultaneous sockets, preventing client-side EMFILE failures that drop requests.
Changes:
- Raise
RLIMIT_NOFILEat the start ofbenchmark_serving.py:main()viaset_ulimit(). - Add explanatory comments describing why the benchmark client needs the ulimit bump (mirroring server behavior).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+689
to
+696
| # Raise the open-file soft limit before opening any connections. At high | ||
| # --max-concurrency each in-flight request is a socket (fd); the default | ||
| # RLIMIT_NOFILE soft (~1024) is exhausted client-side (EMFILE on socket()), | ||
| # silently dropping requests so most never reach the server. The server | ||
| # already calls set_ulimit() at startup; the client must too. | ||
| from atom.utils import set_ulimit | ||
|
|
||
| set_ulimit() |
…currency At high --max-concurrency each in-flight request holds a socket fd. The default soft RLIMIT_NOFILE (~1024) is exhausted client-side (EMFILE on socket()), so most requests fail before reaching the server and the run reports only ~one concurrency-wave of successes (e.g. ~919/10240 at conc=1024) while the server logs 200 OK for every request it actually receives. The server already calls set_ulimit() at startup; call it in the benchmark client too (soft is raised toward 65535, capped at the hard limit).
ae94fc8 to
8ec57c2
Compare
valarLip
approved these changes
Jun 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
At high
--max-concurrency, each in-flight request holds a socket fd. The default softRLIMIT_NOFILE(~1024) is exhausted client-side (EMFILEonsocket()), so most requests fail before ever reaching the server. The benchmark then reports only ~one concurrency-wave of successes (e.g. ~919/10240 at--max-concurrency=1024) while the server logs200 OKfor every request it actually receives.The server already calls
set_ulimit()at startup; this makes the benchmark client do the same (raise soft toward 65535, capped at the hard limit) before opening any connections.This is purely a client-side fd-limit fix — no change to request logic.
Root cause (for context)
Containers launched without
--ulimit nofileinherit containerd's soft limit (1024 on hosts where systemdDefaultLimitNOFILESoft=1024). After a containerd 1.7→2.x upgrade, plainly-launched containers stopped getting a high default, surfacing this on the benchmark client.Test plan
python -m atom.benchmarks.benchmark_serving --dataset-name random --random-input-len 1024 --random-output-len 1024 --num-prompts 10240 --max-concurrency 1024 --request-rate inf --ignore-eos-> all 10240 succeed (previously ~919).set_ulimit()runs before the first request.