lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c596dff4-1e8b-4184-8eb6-590b4da2d92a@intel.com>
Date: Mon, 19 Aug 2024 16:50:52 +0200
From: Alexander Lobakin <aleksander.lobakin@...el.com>
To: Jesper Dangaard Brouer <hawk@...nel.org>,
	Toke Høiland-Jørgensen <toke@...hat.com>, "Lorenzo
 Bianconi" <lorenzo.bianconi@...hat.com>, Daniel Xu <dxu@...uu.xyz>
CC: Alexander Lobakin <alexandr.lobakin@...el.com>, Alexei Starovoitov
	<ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, Andrii Nakryiko
	<andrii@...nel.org>, Larysa Zaremba <larysa.zaremba@...el.com>, "Michal
 Swiatkowski" <michal.swiatkowski@...ux.intel.com>,
	Björn Töpel <bjorn@...nel.org>, Magnus Karlsson
	<magnus.karlsson@...el.com>, Maciej Fijalkowski
	<maciej.fijalkowski@...el.com>, Jonathan Lemon <jonathan.lemon@...il.com>,
	Lorenzo Bianconi <lorenzo@...nel.org>, David Miller <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, "Paolo
 Abeni" <pabeni@...hat.com>, John Fastabend <john.fastabend@...il.com>, "Yajun
 Deng" <yajun.deng@...ux.dev>, Willem de Bruijn <willemb@...gle.com>,
	"bpf@...r.kernel.org" <bpf@...r.kernel.org>, <netdev@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <xdp-hints@...-project.net>
Subject: Re: [xdp-hints] Re: [PATCH RFC bpf-next 32/52] bpf, cpumap: switch to
 GRO from netif_receive_skb_list()

From: Jesper Dangaard Brouer <hawk@...nel.org>
Date: Tue, 13 Aug 2024 17:57:44 +0200

> 
> 
> On 13/08/2024 16.54, Toke Høiland-Jørgensen wrote:
>> Alexander Lobakin <aleksander.lobakin@...el.com> writes:
>>
>>> From: Alexander Lobakin <aleksander.lobakin@...el.com>
>>> Date: Thu, 8 Aug 2024 13:57:00 +0200
>>>
>>>> From: Lorenzo Bianconi <lorenzo.bianconi@...hat.com>
>>>> Date: Thu, 8 Aug 2024 06:54:06 +0200
>>>>
>>>>>> Hi Alexander,

[...]

>>> I did tests on both threaded NAPI for cpumap and my old implementation
>>> with a traffic generator and I have the following (in Kpps):
>>>
> 
> What kind of traffic is the traffic generator sending?
> 
> E.g. is this a type of traffic that gets GRO aggregated?

Yes. It's UDP, with the UDP GRO enabled on the receiver.

> 
>>>              direct Rx    direct GRO    cpumap    cpumap GRO
>>> baseline    2900         5800          2700      2700 (N/A)
>>> threaded                               2300      4000
>>> old GRO                                2300      4000
>>>
> 
> Nice results. Just to confirm, the units are in Kpps.

Yes. I.e. cpumap was giving 2.7 Mpps without GRO, then 4.0 Mpps with it.

> 
> 
>>> IOW,
>>>
>>> 1. There are no differences in perf between Lorenzo's threaded NAPI
>>>     GRO implementation and my old implementation, but Lorenzo's is also
>>>     a very nice cleanup as it switches cpumap to threaded NAPI
>>> completely
>>>     and the final diffstat even removes more lines than adds, while mine
>>>     adds a bunch of lines and refactors a couple hundred, so I'd go with
>>>     his variant.
>>>
>>> 2. After switching to NAPI, the performance without GRO decreases (2.3
>>>     Mpps vs 2.7 Mpps), but after enabling GRO the perf increases hugely
>>>     (4 Mpps vs 2.7 Mpps) even though the CPU needs to compute checksums
>>>     manually.
>>
>> One question for this: IIUC, the benefit of GRO varies with the traffic
>> mix, depending on how much the GRO logic can actually aggregate. So did
>> you test the pathological case as well (spraying packets over so many
>> flows that there is basically no aggregation taking place)? Just to make
>> sure we don't accidentally screw up performance in that case while
>> optimising for the aggregating case :)
>>
> 
> For the GRO use-case, I think a basic TCP stream throughput test (like
> netperf) should show a benefit once cpumap enable GRO, Can you confirm
> this?

Yes, TCP benefits as well.

> Or does the missing hardware RX-hash and RX-checksum cause TCP GRO not
> to fully work, yet?

GRO works well for both TCP and UDP. The main bottleneck is that GRO
calculates the checksum manually on the CPU now, since there's no
checksum status from the NIC.
Also, missing Rx hash means GRO will place packets from every flow into
the same bucket, but it's not a big deal (they get compared layer by
layer anyway).

> 
> Thanks A LOT for doing this benchmarking!

I optimized the code a bit and picked my old patches for bulk NAPI skb
cache allocation and today I got 4.7 Mpps 🎉
IOW, the result of the series (7 patches totally, but 2 are not
networking-related) is 2.7 -> 4.7 Mpps == 75%!

Daniel,

if you want, you can pick my tree[0], either full or just up to

"bpf: cpumap: switch to napi_skb_cache_get_bulk()"

(13 patches total: 6 for netdev_feature_t and 7 for the cpumap)

and test with your usecases. Would be nice to see some real world
results, not my synthetic tests :D

> --Jesper

[0]
https://github.com/alobakin/linux/compare/idpf-libie-new~52...idpf-libie-new/

Thanks,
Olek

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ