linux-kernel - Re: Network performance regression in Linux kernel 6.6 for small socket size test cases

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iLH5+KryWa3GMs-Fz+sdy9Qs7kJqfBwf0229iwgW65Hxg@mail.gmail.com>
Date: Wed, 28 Feb 2024 09:48:09 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Abdul Anshad Azeez <abdul-anshad.azeez@...adcom.com>
Cc: davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com, corbet@....net, 
	dsahern@...nel.org, netdev@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Boon Ang <boon.ang@...adcom.com>, John Savanyo <john.savanyo@...adcom.com>, 
	Peter Jonasson <peter.jonasson@...adcom.com>, Rajender M <rajender.m@...adcom.com>
Subject: Re: Network performance regression in Linux kernel 6.6 for small
 socket size test cases

On Wed, Feb 28, 2024 at 7:43 AM Abdul Anshad Azeez
<abdul-anshad.azeez@...adcom.com> wrote:
>
> During performance regression workload execution of the Linux
> kernel we observed up to 30% performance decrease in a specific networking
> workload on the 6.6 kernel compared to 6.5 (details below). The regression is
> reproducible in both Linux VMs running on ESXi and bare metal Linux.
>
> Workload details:
>
> Benchmark - Netperf TCP_STREAM
> Socket buffer size - 8K
> Message size - 256B
> MTU - 1500B
> Socket option - TCP_NODELAY
> # of STREAMs - 32
> Direction - Uni-Directional Receive
> Duration - 60 Seconds
> NIC - Mellanox Technologies ConnectX-6 Dx EN 100G
> Server Config - Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz & 512G Memory
>
> Bisect between 6.5 and 6.6 kernel concluded that this regression originated
> from the below commit:
>
> commit - dfa2f0483360d4d6f2324405464c9f281156bd87 (tcp: get rid of
> sysctl_tcp_adv_win_scale)
> Author - Eric Dumazet <edumazet@...gle.com>
> Link -
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=
> dfa2f0483360d4d6f2324405464c9f281156bd87
>
> Performance data for (Linux VM on ESXi):
> Test case - TCP_STREAM_RECV Throughput in Gbps
> (for different socket buffer sizes and with constant message size - 256B):
>
> Socket buffer size - [LK6.5 vs LK6.6]
> 8K - [8.4 vs 5.9 Gbps]
> 16K - [13.4 vs 10.6 Gbps]
> 32K - [19.1 vs 16.3 Gbps]
> 64K - [19.6 vs 19.7 Gbps]
> Autotune - [19.7 vs 19.6 Gbps]
>
> From the above performance data, we can infer that:
> * Regression is specific to lower fixed socket buffer sizes (8K, 16K & 32K).
> * Increasing the socket buffer size gradually decreases the throughput impact.
> * Performance is equal for higher fixed socket size (64K) and Autotune socket
> tests.
>
> We would like to know if there are any opportunities for optimization in
> the test cases with small socket sizes.
>

Sure, I would suggest not setting small SO_RCVBUF values in 2024,
or you get what you ask for (going back to old TCP performance of year 2010 )

Back in 2018, we set tcp_rmem[1] to 131072 for a good reason.

commit a337531b942bd8a03e7052444d7e36972aac2d92
Author: Yuchung Cheng <ycheng@...gle.com>
Date:   Thu Sep 27 11:21:19 2018 -0700

    tcp: up initial rmem to 128KB and SYN rwin to around 64KB


I can not enforce a minimum in SO_RCVBUF (other than the small one added in
commit eea86af6b1e18d6fa8dc959e3ddc0100f27aff9f     ("net: sock: adapt
SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF"))
otherwise many test programs will break, expecting to set a low value.