[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d9327631-0673-4e70-afe0-5923bda6fd45@linux.alibaba.com>
Date: Thu, 17 Oct 2024 15:46:49 +0800
From: Philo Lu <lulie@...ux.alibaba.com>
To: Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org
Cc: willemdebruijn.kernel@...il.com, davem@...emloft.net,
edumazet@...gle.com, kuba@...nel.org, dsahern@...nel.org,
antony.antony@...unet.com, steffen.klassert@...unet.com,
linux-kernel@...r.kernel.org, dust.li@...ux.alibaba.com,
jakub@...udflare.com, fred.cc@...baba-inc.com,
yubing.qiuyubing@...baba-inc.com
Subject: Re: [PATCH v4 net-next 2/3] net/udp: Add 4-tuple hash list basis
On 2024/10/16 15:45, Paolo Abeni wrote:
> On 10/16/24 08:30, Philo Lu wrote:
>> On 2024/10/14 18:07, Paolo Abeni wrote:
>>> It would be great if you could please share some benchmark showing the
>>> raw max receive PPS performances for unconnected sockets, with and
>>> without this series applied, to ensure this does not cause any real
>>> regression for such workloads.
>>>
>>
>> Tested using sockperf tp with default msgsize (14B), 3 times for w/ and
>> w/o the patch set, and results show no obvious difference:
>>
>> [msg/sec] test1 test2 test3 mean
>> w/o patch 514,664 519,040 527,115 520.3k
>> w/ patch 516,863 526,337 527,195 523.5k (+0.6%)
>>
>> Thank you for review, Paolo.
>
> Are the value in packet per seconds, or bytes per seconds? Are you doing
> a loopback test or over the wire? The most important question is: is the
> receiver side keeping (at least) 1 CPU fully busy? Otherwise the test is
> not very relevant.
>
> It looks like you have some setup issue, or you are using a relatively
> low end H/W: the expected packet rate for reasonable server H/W is well
> above 1M (possibly much more than that, but I can't put my hands on
> recent H/W, so I can't provide a more accurate figure).
>
> A single socket, user-space, UDP sender is usually unable to reach such
> tput without USO, and even with USO you likely need to do an over-the-
> wire test to really be able to keep the receiver fully busy. AFAICS
> sockperf does not support USO for the sender.
>
> You could use the udpgso_bench_tx/udpgso_bench_rx pair from the net
> selftests directory instead.
>
> Or you could use pktgen as traffic generator.
>
I test it again with udpgso_bench_tx/udpgso_bench_rx. In server, 2 cpus
are involved, one for udpgso_bench_rx and the other for nic rx queue so
that the si of nic rx cpu is 100%. udpgso_bench_tx runs with payload
size 20, and the tx pps is larger than rx ensuring rx is the bottleneck.
The outputs of udpgso_bench_rx:
[without patchset]
udp rx: 20 MB/s 1092546 calls/s
udp rx: 20 MB/s 1095051 calls/s
udp rx: 20 MB/s 1094136 calls/s
udp rx: 20 MB/s 1098860 calls/s
udp rx: 20 MB/s 1097963 calls/s
udp rx: 20 MB/s 1097460 calls/s
udp rx: 20 MB/s 1098370 calls/s
udp rx: 20 MB/s 1098089 calls/s
udp rx: 20 MB/s 1095330 calls/s
udp rx: 20 MB/s 1095486 calls/s
[with patchset]
udp rx: 21 MB/s 1105533 calls/s
udp rx: 21 MB/s 1105475 calls/s
udp rx: 21 MB/s 1104244 calls/s
udp rx: 21 MB/s 1105600 calls/s
udp rx: 21 MB/s 1108019 calls/s
udp rx: 21 MB/s 1101971 calls/s
udp rx: 21 MB/s 1104147 calls/s
udp rx: 21 MB/s 1104874 calls/s
udp rx: 21 MB/s 1101987 calls/s
udp rx: 21 MB/s 1105500 calls/s
The averages w/ and w/o the patchset are 1104735 and 1096329, the gap is
0.8%, which I think is negligible.
Besides, perf shows ~0.6% higher cpu consumption of __udp4_lib_lookup()
with this patchset (increasing from 5.7% to 6.3%).
Thanks.
--
Philo
Powered by blists - more mailing lists