[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8734ndq0cd.fsf@toke.dk>
Date: Fri, 09 Aug 2024 15:42:26 +0200
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Alexander Lobakin <aleksander.lobakin@...el.com>
Cc: Daniel Xu <dxu@...uu.xyz>, Lorenzo Bianconi
<lorenzo.bianconi@...hat.com>, Alexander Lobakin
<alexandr.lobakin@...el.com>, Alexei Starovoitov <ast@...nel.org>, Daniel
Borkmann <daniel@...earbox.net>, Andrii Nakryiko <andrii@...nel.org>,
Larysa Zaremba <larysa.zaremba@...el.com>, Michal Swiatkowski
<michal.swiatkowski@...ux.intel.com>, Jesper Dangaard Brouer
<hawk@...nel.org>, Björn Töpel <bjorn@...nel.org>,
Magnus Karlsson
<magnus.karlsson@...el.com>, Maciej Fijalkowski
<maciej.fijalkowski@...el.com>, Jonathan Lemon <jonathan.lemon@...il.com>,
Lorenzo Bianconi <lorenzo@...nel.org>, David Miller <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, John Fastabend
<john.fastabend@...il.com>, Yajun Deng <yajun.deng@...ux.dev>, Willem de
Bruijn <willemb@...gle.com>, "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
xdp-hints@...-project.net
Subject: Re: [xdp-hints] Re: [PATCH RFC bpf-next 32/52] bpf, cpumap: switch
to GRO from netif_receive_skb_list()
Alexander Lobakin <aleksander.lobakin@...el.com> writes:
> From: Toke Høiland-Jørgensen <toke@...hat.com>
> Date: Fri, 09 Aug 2024 14:45:33 +0200
>
>> Alexander Lobakin <aleksander.lobakin@...el.com> writes:
>>
>>> From: Daniel Xu <dxu@...uu.xyz>
>>> Date: Thu, 08 Aug 2024 16:52:51 -0400
>>>
>>>> Hi,
>>>>
>>>> On Thu, Aug 8, 2024, at 7:57 AM, Alexander Lobakin wrote:
>>>>> From: Lorenzo Bianconi <lorenzo.bianconi@...hat.com>
>>>>> Date: Thu, 8 Aug 2024 06:54:06 +0200
>>>>>
>>>>>>> Hi Alexander,
>>>>>>>
>>>>>>> On Tue, Jun 28, 2022, at 12:47 PM, Alexander Lobakin wrote:
>>>>>>>> cpumap has its own BH context based on kthread. It has a sane batch
>>>>>>>> size of 8 frames per one cycle.
>>>>>>>> GRO can be used on its own, adjust cpumap calls to the
>>>>>>>> upper stack to use GRO API instead of netif_receive_skb_list() which
>>>>>>>> processes skbs by batches, but doesn't involve GRO layer at all.
>>>>>>>> It is most beneficial when a NIC which frame come from is XDP
>>>>>>>> generic metadata-enabled, but in plenty of tests GRO performs better
>>>>>>>> than listed receiving even given that it has to calculate full frame
>>>>>>>> checksums on CPU.
>>>>>>>> As GRO passes the skbs to the upper stack in the batches of
>>>>>>>> @gro_normal_batch, i.e. 8 by default, and @skb->dev point to the
>>>>>>>> device where the frame comes from, it is enough to disable GRO
>>>>>>>> netdev feature on it to completely restore the original behaviour:
>>>>>>>> untouched frames will be being bulked and passed to the upper stack
>>>>>>>> by 8, as it was with netif_receive_skb_list().
>>>>>>>>
>>>>>>>> Signed-off-by: Alexander Lobakin <alexandr.lobakin@...el.com>
>>>>>>>> ---
>>>>>>>> kernel/bpf/cpumap.c | 43 ++++++++++++++++++++++++++++++++++++++-----
>>>>>>>> 1 file changed, 38 insertions(+), 5 deletions(-)
>>>>>>>>
>>>>>>>
>>>>>>> AFAICT the cpumap + GRO is a good standalone improvement. I think
>>>>>>> cpumap is still missing this.
>>>>>
>>>>> The only concern for having GRO in cpumap without metadata from the NIC
>>>>> descriptor was that when the checksum status is missing, GRO calculates
>>>>> the checksum on CPU, which is not really fast.
>>>>> But I remember sometimes GRO was faster despite that.
>>>>
>>>> Good to know, thanks. IIUC some kind of XDP hint support landed already?
>>>>
>>>> My use case could also use HW RSS hash to avoid a rehash in XDP prog.
>>>
>>> Unfortunately, for now it's impossible to get HW metadata such as RSS
>>> hash and checksum status in cpumap. They're implemented via kfuncs
>>> specific to a particular netdevice and this info is available only when
>>> running XDP prog.
>>>
>>> But I think one solution could be:
>>>
>>> 1. We create some generic structure for cpumap, like
>>>
>>> struct cpumap_meta {
>>> u32 magic;
>>> u32 hash;
>>> }
>>>
>>> 2. We add such check in the cpumap code
>>>
>>> if (xdpf->metalen == sizeof(struct cpumap_meta) &&
>>> <here we check magic>)
>>> skb->hash = meta->hash;
>>>
>>> 3. In XDP prog, you call Rx hints kfuncs when they're available, obtain
>>> RSS hash and then put it in the struct cpumap_meta as XDP frame metadata.
>>
>> Yes, except don't make this cpumap-specific, make it generic for kernel
>> consumption of the metadata. That way it doesn't even have to be stored
>> in the xdp metadata area, it can be anywhere we want (and hence not
>> subject to ABI issues), and we can use it for skb creation after
>> redirect in other places than cpumap as well (say, on veth devices).
>>
>> So it'll be:
>>
>> struct kernel_meta {
>> u32 hash;
>> u32 timestamp;
>> ...etc
>> }
>>
>> and a kfunc:
>>
>> void store_xdp_kernel_meta(struct kernel meta *meta);
>>
>> which the XDP program can call to populate the metadata area.
>
> Hmm, nice!
>
> But where to store this info in case of cpumap if not in xdp->data_meta?
> When you convert XDP frames to skbs in the cpumap code, you only have
> &xdp_frame and that's it. XDP prog was already run earlier from the
> driver code at that point.
Well, we could put it in skb_shared_info? IIRC, some of the metadata
(timestamps?) end up there when building an skb anyway, so we won't even
have to copy it around...
-Toke
Powered by blists - more mailing lists