[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220308030348.258934-1-kuba@kernel.org>
Date: Mon, 7 Mar 2022 19:03:48 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: edumazet@...gle.com
Cc: netdev@...r.kernel.org, willemb@...gle.com, ncardwell@...gle.com,
ycheng@...gle.com, Jakub Kicinski <kuba@...nel.org>
Subject: [RFC net-next] tcp: allow larger TSO to be built under overload
We observed Tx-heavy workloads causing softirq overload because
with increased load and therefore latency the pacing rates fall,
pushing TCP to generate smaller and smaller TSO packets.
It seems reasonable to allow larger packets to be built when
system is under stress. TCP already uses the
this_cpu_ksoftirqd() == current
condition as an indication of overload for TSQ scheduling.
Signed-off-by: Jakub Kicinski <kuba@...nel.org>
---
Sending as an RFC because it seems reasonable, but really
I haven't run any large scale testing, yet. Bumping
tcp_min_tso_segs to prevent overloads is okay but it
seems like we can do better since we only need coarser
pacing once disaster strikes?
The downsides are that users may have already increased
the value to what's needed during overload, or applied
the same logic in out-of-tree CA algo implementations
(only BBR implements ca_ops->min_tso_segs() upstream).
---
net/ipv4/tcp_output.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 2319531267c6..815ef4ffc39d 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1967,7 +1967,13 @@ static u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now,
* This preserves ACK clocking and is consistent
* with tcp_tso_should_defer() heuristic.
*/
- segs = max_t(u32, bytes / mss_now, min_tso_segs);
+ segs = bytes / mss_now;
+ if (segs < min_tso_segs) {
+ segs = min_tso_segs;
+ /* Allow larger packets under stress */
+ if (this_cpu_ksoftirqd() == current)
+ segs *= 2;
+ }
return segs;
}
--
2.34.1
Powered by blists - more mailing lists