lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Mon,  7 Mar 2022 19:03:48 -0800
From:   Jakub Kicinski <kuba@...nel.org>
To:     edumazet@...gle.com
Cc:     netdev@...r.kernel.org, willemb@...gle.com, ncardwell@...gle.com,
        ycheng@...gle.com, Jakub Kicinski <kuba@...nel.org>
Subject: [RFC net-next] tcp: allow larger TSO to be built under overload

We observed Tx-heavy workloads causing softirq overload because
with increased load and therefore latency the pacing rates fall,
pushing TCP to generate smaller and smaller TSO packets.

It seems reasonable to allow larger packets to be built when
system is under stress. TCP already uses the

  this_cpu_ksoftirqd() == current

condition as an indication of overload for TSQ scheduling.

Signed-off-by: Jakub Kicinski <kuba@...nel.org>
---
Sending as an RFC because it seems reasonable, but really
I haven't run any large scale testing, yet. Bumping
tcp_min_tso_segs to prevent overloads is okay but it
seems like we can do better since we only need coarser
pacing once disaster strikes?

The downsides are that users may have already increased
the value to what's needed during overload, or applied
the same logic in out-of-tree CA algo implementations
(only BBR implements ca_ops->min_tso_segs() upstream).
---
 net/ipv4/tcp_output.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 2319531267c6..815ef4ffc39d 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1967,7 +1967,13 @@ static u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now,
 	 * This preserves ACK clocking and is consistent
 	 * with tcp_tso_should_defer() heuristic.
 	 */
-	segs = max_t(u32, bytes / mss_now, min_tso_segs);
+	segs = bytes / mss_now;
+	if (segs < min_tso_segs) {
+		segs = min_tso_segs;
+		/* Allow larger packets under stress */
+		if (this_cpu_ksoftirqd() == current)
+			segs *= 2;
+	}
 
 	return segs;
 }
-- 
2.34.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ