lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CANn89i+CEHApkAO7msUPoQdMgjJsgJ=gNuHcOdYZqbwEdwVrOg@mail.gmail.com>
Date: Wed, 6 Mar 2024 13:51:42 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Boon Ang <boon.ang@...adcom.com>
Cc: Abdul Anshad Azeez <abdul-anshad.azeez@...adcom.com>, davem@...emloft.net, kuba@...nel.org, 
	pabeni@...hat.com, corbet@....net, dsahern@...nel.org, netdev@...r.kernel.org, 
	linux-kernel@...r.kernel.org, John Savanyo <john.savanyo@...adcom.com>, 
	Peter Jonasson <peter.jonasson@...adcom.com>, Rajender M <rajender.m@...adcom.com>
Subject: Re: Network performance regression in Linux kernel 6.6 for small
 socket size test cases

On Wed, Mar 6, 2024 at 1:43 PM Boon Ang <boon.ang@...adcom.com> wrote:
>
> Hello Eric,
>
> The choice of socket buffer size is something that  an application can decide and there many be reasons to keep to smaller sizes.  While high bandwidth transfers obviously should use larger sizes, a change that regresses the performance of existing configuration is a regression.  Is there any way to modify your change so that it keeps the benefits while avoiding the degradation for small socket sizes?
>


The kernel limits the amount of memory used by the receive queue.

The problem is that for XXX bytes of payload (what the user application wants),
the metadata overhead is not fixed.

Kernel structures change over time, and packets are not always full
from the remote peer (that we can not control)

1000 bytes of payload might fit in 2KB, or 2MB depending on how the
bytes are spread over multiple skbs.

This issue has been there forever, the kernel can not put in stone any rule :

XXXX bytes of payload  --->  YYYY bytes of kernel memory to hold XXXX
bytes of payload.

It is time that applications setting tiny SO_RCVBUF values get what they want :

Poor TCP performance.

Thanks.

> Thanks
>   Boon
>
> On Wed, Feb 28, 2024 at 12:48 AM Eric Dumazet <edumazet@...gle.com> wrote:
>>
>> On Wed, Feb 28, 2024 at 7:43 AM Abdul Anshad Azeez
>> <abdul-anshad.azeez@...adcom.com> wrote:
>> >
>> > During performance regression workload execution of the Linux
>> > kernel we observed up to 30% performance decrease in a specific networking
>> > workload on the 6.6 kernel compared to 6.5 (details below). The regression is
>> > reproducible in both Linux VMs running on ESXi and bare metal Linux.
>> >
>> > Workload details:
>> >
>> > Benchmark - Netperf TCP_STREAM
>> > Socket buffer size - 8K
>> > Message size - 256B
>> > MTU - 1500B
>> > Socket option - TCP_NODELAY
>> > # of STREAMs - 32
>> > Direction - Uni-Directional Receive
>> > Duration - 60 Seconds
>> > NIC - Mellanox Technologies ConnectX-6 Dx EN 100G
>> > Server Config - Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz & 512G Memory
>> >
>> > Bisect between 6.5 and 6.6 kernel concluded that this regression originated
>> > from the below commit:
>> >
>> > commit - dfa2f0483360d4d6f2324405464c9f281156bd87 (tcp: get rid of
>> > sysctl_tcp_adv_win_scale)
>> > Author - Eric Dumazet <edumazet@...gle.com>
>> > Link -
>> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=
>> > dfa2f0483360d4d6f2324405464c9f281156bd87
>> >
>> > Performance data for (Linux VM on ESXi):
>> > Test case - TCP_STREAM_RECV Throughput in Gbps
>> > (for different socket buffer sizes and with constant message size - 256B):
>> >
>> > Socket buffer size - [LK6.5 vs LK6.6]
>> > 8K - [8.4 vs 5.9 Gbps]
>> > 16K - [13.4 vs 10.6 Gbps]
>> > 32K - [19.1 vs 16.3 Gbps]
>> > 64K - [19.6 vs 19.7 Gbps]
>> > Autotune - [19.7 vs 19.6 Gbps]
>> >
>> > From the above performance data, we can infer that:
>> > * Regression is specific to lower fixed socket buffer sizes (8K, 16K & 32K).
>> > * Increasing the socket buffer size gradually decreases the throughput impact.
>> > * Performance is equal for higher fixed socket size (64K) and Autotune socket
>> > tests.
>> >
>> > We would like to know if there are any opportunities for optimization in
>> > the test cases with small socket sizes.
>> >
>>
>> Sure, I would suggest not setting small SO_RCVBUF values in 2024,
>> or you get what you ask for (going back to old TCP performance of year 2010 )
>>
>> Back in 2018, we set tcp_rmem[1] to 131072 for a good reason.
>>
>> commit a337531b942bd8a03e7052444d7e36972aac2d92
>> Author: Yuchung Cheng <ycheng@...gle.com>
>> Date:   Thu Sep 27 11:21:19 2018 -0700
>>
>>     tcp: up initial rmem to 128KB and SYN rwin to around 64KB
>>
>>
>> I can not enforce a minimum in SO_RCVBUF (other than the small one added in
>> commit eea86af6b1e18d6fa8dc959e3ddc0100f27aff9f     ("net: sock: adapt
>> SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF"))
>> otherwise many test programs will break, expecting to set a low value.
>
>
> This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ