[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7cc939cf-d63f-41fc-8048-893a57ac4ab1@linux.alibaba.com>
Date: Wed, 13 Dec 2023 21:38:48 +0800
From: Wen Gu <guwen@...ux.alibaba.com>
To: Gerd Bayer <gbayer@...ux.ibm.com>,
"Nikolaou Alexandros (SO/PAF1-Mb)" <Alexandros.Nikolaou@...bosch.com>,
"D . Wythe" <alibuda@...ux.alibaba.com>, Tony Lu <tonylu@...ux.alibaba.com>,
Nils Hoppmann <niho@...ux.ibm.com>
Cc: "linux-s390@...r.kernel.org" <linux-s390@...r.kernel.org>,
netdev <netdev@...r.kernel.org>, Wenjia Zhang <wenjia@...ux.ibm.com>,
Jan Karcher <jaka@...ux.ibm.com>, Dust Li <dust.li@...ux.alibaba.com>
Subject: Re: SMC-R throughput drops for specific message sizes
On 2023/12/13 20:17, Gerd Bayer wrote:
> Hi Nikolaou,
>
> thank you for providing more details about your setup.
>
> On Wed, 2023-12-06 at 15:28 +0000, Nikolaou Alexandros (SO/PAF1-Mb)
> wrote:
>> Dear Wenjia,
>
> while Wenjia is out, I'm writing primarily to getting some more folks'
> attention to this topic. Furthermore, I'm moving the discussion to the
> netdev mailing list where SMC discussions usually take place.
>
>> Thanks for getting back to me. Some further details on the
>> experiments are:
>>
>> - The tests had been conducted on a one-to-one connection between two
>> Mellanox-powered (mlx5, ConnectX-5) PCs.
>> - Attached you may find the client log of the qperf output. You may
>> notice that for the majority of message size values, the bandwidth is
>> around 3.2GB/s which matches the maximum throughput of the
>> mellanox NICs.
>> According to a periodic regular pattern though, with the first
>> occurring at a message size of 473616 – 522192 (with a step of
>> 12144kB), the 3.2GB/s throughput drops substantially. The
>> corresponding commands for these drops are
>> server: smc_run qperf
>> client: smc_run qperf -v -uu -H worker1 -m 473616 tcp_bw
>> - Our smc version (3E92E1460DA96BE2B2DDC2F, smc-tools-1.2.2) does not
>> provide us with the smcr info, smc_rnics -a and smcr -d
>> stats commands. As an alternative, you may also find attached the
>> output of ibv_devinfo -v.
>> - Buffer size:
>> sudo sysctl -w net.ipv4.tcp_rmem="4096 1048576 6291456"
>> sudo sysctl -w net.ipv4.tcp_wmem="4096 1048576 6291456"
>> - MTU size: 9000
>>
>> Should you require further information, please let me know.
>
> Wenjia and I belong to a group of Linux on Z developers that maintains
> the SMC protocol on s390 mainframe systems. Nils Hoppmann is our expert
> for performance and might be able to shed some light on his experiences
> with throughput drops for particular SMC message sizes. Our experience
> is heavily biased towards IBM Z systems, though - with their distinct
> cache and PCI root-complex hardware designs.
>
> Over the last few years there's a group around D. Wythe, Wen Gu and
> Tony Lu who adopted and extended the SMC protocol for use-cases on x86
> architectures. I address them here explicitly, soliciting feedback on
> their experiences.
Certainly. Our team will take a closer look into this matter as well.
We intend to review the thread thoroughly and conduct an analysis within
our environment. Updates and feedback will be provided in this thread.
>
> All in all there are several moving parts involved here, that could
> play a role:
> - firmware level of your Mellanox/NVidia NICs,
> - platform specific hardware designs re. cache and root-complexes,
> interrupt distribution, ...
> - exact code level of the device drivers and the SMC protocol
>
> This is just a heads-up, that there may be requests to try things with
> newer code levels ;)
>
> Thank you,
> Gerd
>
> --
> Gerd Bayer
> Linux on IBM Z Development - IBM Germany R&D
Powered by blists - more mailing lists