netdev - [RFC net-next] tcp: allow larger TSO to be built under overload

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20220308030348.258934-1-kuba@kernel.org>
Date:   Mon,  7 Mar 2022 19:03:48 -0800
From:   Jakub Kicinski <kuba@...nel.org>
To:     edumazet@...gle.com
Cc:     netdev@...r.kernel.org, willemb@...gle.com, ncardwell@...gle.com,
        ycheng@...gle.com, Jakub Kicinski <kuba@...nel.org>
Subject: [RFC net-next] tcp: allow larger TSO to be built under overload

We observed Tx-heavy workloads causing softirq overload because
with increased load and therefore latency the pacing rates fall,
pushing TCP to generate smaller and smaller TSO packets.

It seems reasonable to allow larger packets to be built when
system is under stress. TCP already uses the

  this_cpu_ksoftirqd() == current

condition as an indication of overload for TSQ scheduling.

Signed-off-by: Jakub Kicinski <kuba@...nel.org>
---
Sending as an RFC because it seems reasonable, but really
I haven't run any large scale testing, yet. Bumping
tcp_min_tso_segs to prevent overloads is okay but it
seems like we can do better since we only need coarser
pacing once disaster strikes?

The downsides are that users may have already increased
the value to what's needed during overload, or applied
the same logic in out-of-tree CA algo implementations
(only BBR implements ca_ops->min_tso_segs() upstream).
---
 net/ipv4/tcp_output.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 2319531267c6..815ef4ffc39d 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1967,7 +1967,13 @@ static u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now,
 	 * This preserves ACK clocking and is consistent
 	 * with tcp_tso_should_defer() heuristic.
 	 */
-	segs = max_t(u32, bytes / mss_now, min_tso_segs);
+	segs = bytes / mss_now;
+	if (segs < min_tso_segs) {
+		segs = min_tso_segs;
+		/* Allow larger packets under stress */
+		if (this_cpu_ksoftirqd() == current)
+			segs *= 2;
+	}
 
 	return segs;
 }
-- 
2.34.1