[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <25860d8b-a980-4f04-a376-b9cec03605fb@kernel.org>
Date: Tue, 13 Aug 2024 11:51:45 +0200
From: Jesper Dangaard Brouer <hawk@...nel.org>
To: Jakub Kicinski <kuba@...nel.org>,
Alexander Lobakin <aleksander.lobakin@...el.com>
Cc: Daniel Xu <dxu@...uu.xyz>, Lorenzo Bianconi
<lorenzo.bianconi@...hat.com>, Alexander Lobakin
<alexandr.lobakin@...el.com>, Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>, Andrii Nakryiko <andrii@...nel.org>,
Larysa Zaremba <larysa.zaremba@...el.com>,
Michal Swiatkowski <michal.swiatkowski@...ux.intel.com>,
Björn Töpel <bjorn@...nel.org>,
Magnus Karlsson <magnus.karlsson@...el.com>,
Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
Jonathan Lemon <jonathan.lemon@...il.com>, "toke@...hat.com"
<toke@...hat.com>, Lorenzo Bianconi <lorenzo@...nel.org>,
David Miller <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Paolo Abeni <pabeni@...hat.com>,
Jesse Brandeburg <jesse.brandeburg@...el.com>,
John Fastabend <john.fastabend@...il.com>, Yajun Deng
<yajun.deng@...ux.dev>, Willem de Bruijn <willemb@...gle.com>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, xdp-hints@...-project.net
Subject: Re: [xdp-hints] Re: [PATCH RFC bpf-next 32/52] bpf, cpumap: switch to
GRO from netif_receive_skb_list()
On 13/08/2024 03.33, Jakub Kicinski wrote:
> On Fri, 9 Aug 2024 14:20:25 +0200 Alexander Lobakin wrote:
>> But I think one solution could be:
>>
>> 1. We create some generic structure for cpumap, like
>>
>> struct cpumap_meta {
>> u32 magic;
>> u32 hash;
>> }
>>
>> 2. We add such check in the cpumap code
>>
>> if (xdpf->metalen == sizeof(struct cpumap_meta) &&
>> <here we check magic>)
>> skb->hash = meta->hash;
>>
>> 3. In XDP prog, you call Rx hints kfuncs when they're available, obtain
>> RSS hash and then put it in the struct cpumap_meta as XDP frame metadata.
>
> I wonder what the overhead of skb metadata allocation is in practice.
> With Eric's "return skb to the CPU of origin" we can feed the lockless
> skb cache one the right CPU, and also feed the lockless page pool
> cache. I wonder if batched RFS wouldn't be faster than the XDP thing
> that requires all the groundwork.
I explicitly developed CPUMAP because I was benchmarking Receive Flow
Steering (RFS) and Receive Packet Steering (RPS), which I observed was
the bottleneck. The overhead was too large on the RX-CPU and bottleneck
due to RFS and RPS maintaining data structures to avoid Out-of-Order
packets. The Flow Dissector step was also a limiting factor.
By bottleneck I mean it didn't scale, as RX-CPU packet per second
processing speeds was too low compared to the remote-CPU pps.
Digging in my old notes, I can see that RPS was limited to around 4.8
Mpps (and I have a weird disabling part of it showing 7.5Mpps). In [1]
remote-CPU could process (starts at) 2.7 Mpps when dropping UDP packet
due to UdpNoPorts configured (and baseline 3.3 Mpps if not remote), thus
it only scales up-to 1.78 remote-CPUs. [1] shows how optimizations
brings remote-CPU to handle 3.2Mpps (close non-remote to 3.3Mpps
baseline). In [2] those optimizations bring remote-CPU to 4Mpps (for
UdpNoPorts case). XDP RX-redirect in [1]+[2] was around 19Mpps (which
might be lower today due to perf paper cuts).
[1]
https://github.com/xdp-project/xdp-project/blob/master/areas/cpumap/cpumap02-optimizations.org
[2]
https://github.com/xdp-project/xdp-project/blob/master/areas/cpumap/cpumap03-optimizations.org
The benefits Eric's "return skb to the CPU of origin" should help
improve the case for the remote-CPU, as I was seeing some bottlenecks in
how we returned the memory.
--Jesper
Powered by blists - more mailing lists