tcp: register tcp.rcv_wnd_max and tcp.snd_mss_max as runtime-tunable sysctls#46
Open
doronz88 wants to merge 1 commit into
Open
tcp: register tcp.rcv_wnd_max and tcp.snd_mss_max as runtime-tunable sysctls#46doronz88 wants to merge 1 commit into
doronz88 wants to merge 1 commit into
Conversation
…sysctls
Two TCP policy values were effectively hard-coded, forcing applications
that embed PyTCP to monkeypatch internals to tune throughput on fast or
asymmetric paths:
* The advertised receive-window ceiling was fixed at 65535 bytes
(WindowState.rcv_wnd_max). A bulk inbound transfer is bound by
window / RTT, so on a high bandwidth-delay-product path (fast link,
tunnel) 64 KiB throttles the peer far below the link rate even though
PyTCP already negotiates RFC 7323 window scaling.
* The send-side MSS always tracked the egress interface MTU, with no
way to emit smaller segments while still advertising a large receive
MSS. Overlay/tunnel deployments whose host->peer path MTU is below
the local interface MTU need exactly that asymmetry (and classical
PMTUD cannot discover the smaller hop when it sits past a relay that
drops ICMP PTB).
Expose both as registered sysctls, consistent with the existing
'tcp.base_mss' / 'tcp.mtu_probing' knobs:
* 'tcp.rcv_wnd_max' (flat; Linux net.ipv4.tcp_rmem parity) — default
65535, seeded into WindowState.rcv_wnd_max at session creation, so
behaviour is unchanged until an operator raises it.
* 'tcp.snd_mss_max' (per-interface, like 'tcp.base_mss') — default 0
(uncapped); a non-zero value clamps _mss_ceiling() last, bounding the
segments we EMIT without lowering the advertised receive MSS. Floor
88 (Linux TCP_MIN_MSS); 0 reserved for "off".
Both ride the 'sysctls={...}' bag in stack.init(); neither warrants an
explicit kwarg yet.
Tests at tests/integration/protocols/tcp/test__tcp__sysctls.py pin
registration, defaults, validator rejection (rcv_wnd_max != 0;
snd_mss_max 0-or->=88) and the per-interface storage semantics;
test__tcp__session__throughput_knobs.py pins the session behaviour
(rcv_wnd_max seeds the window; snd_mss_max caps _mss_ceiling while
leaving rcv_mss at the interface ceiling).
Reference: Linux net.ipv4.tcp_rmem (receive-window max).
Reference: Linux include/net/tcp.h TCP_MIN_MSS=88.
815fd47 to
85b2aec
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two TCP policy values were effectively hard-coded, forcing applications that embed PyTCP to monkeypatch internals to tune throughput on fast or asymmetric paths:
The advertised receive-window ceiling was fixed at 65535 bytes (WindowState.rcv_wnd_max). A bulk inbound transfer is bound by window / RTT, so on a high bandwidth-delay-product path (fast link, tunnel) 64 KiB throttles the peer far below the link rate even though PyTCP already negotiates RFC 7323 window scaling.
The send-side MSS always tracked the egress interface MTU, with no way to emit smaller segments while still advertising a large receive MSS. Overlay/tunnel deployments whose host->peer path MTU is below the local interface MTU need exactly that asymmetry (and classical PMTUD cannot discover the smaller hop when it sits past a relay that drops ICMP PTB).
Expose both as registered sysctls, consistent with the existing 'tcp.base_mss' / 'tcp.mtu_probing' knobs:
'tcp.rcv_wnd_max' (flat; Linux net.ipv4.tcp_rmem parity) — default 65535, seeded into WindowState.rcv_wnd_max at session creation, so behaviour is unchanged until an operator raises it.
'tcp.snd_mss_max' (per-interface, like 'tcp.base_mss') — default 0 (uncapped); a non-zero value clamps _mss_ceiling() last, bounding the segments we EMIT without lowering the advertised receive MSS. Floor 88 (Linux TCP_MIN_MSS); 0 reserved for "off".
Both ride the 'sysctls={...}' bag in stack.init(); neither warrants an explicit kwarg yet.
Tests at tests/integration/protocols/tcp/test__tcp__sysctls.py pin registration, defaults, validator rejection (rcv_wnd_max != 0; snd_mss_max 0-or->=88) and the per-interface storage semantics; test__tcp__session__throughput_knobs.py pins the session behaviour (rcv_wnd_max seeds the window; snd_mss_max caps _mss_ceiling while leaving rcv_mss at the interface ceiling).
Reference: Linux net.ipv4.tcp_rmem (receive-window max).
Reference: Linux include/net/tcp.h TCP_MIN_MSS=88.