[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aFBI6msJQn4-LZsH@lore-desk>
Date: Mon, 16 Jun 2025 18:40:10 +0200
From: Lorenzo Bianconi <lorenzo@...nel.org>
To: Stanislav Fomichev <stfomichev@...il.com>
Cc: Toke Høiland-Jørgensen <toke@...hat.com>,
Daniel Borkmann <daniel@...earbox.net>,
Jesper Dangaard Brouer <hawk@...nel.org>, bpf@...r.kernel.org,
netdev@...r.kernel.org, Jakub Kicinski <kuba@...nel.org>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <borkmann@...earbox.net>,
Eric Dumazet <eric.dumazet@...il.com>,
"David S. Miller" <davem@...emloft.net>,
Paolo Abeni <pabeni@...hat.com>, sdf@...ichev.me,
kernel-team@...udflare.com, arthur@...hurfabre.com,
jakub@...udflare.com, Magnus Karlsson <magnus.karlsson@...el.com>,
Maciej Fijalkowski <maciej.fijalkowski@...el.com>
Subject: Re: [PATCH bpf-next V1 7/7] net: xdp: update documentation for
xdp-rx-metadata.rst
> On 06/16, Lorenzo Bianconi wrote:
> > On Jun 10, Stanislav Fomichev wrote:
> > > On 06/11, Lorenzo Bianconi wrote:
> > > > > Daniel Borkmann <daniel@...earbox.net> writes:
> > > > >
> > > > [...]
> > > > > >>
> > > > > >> Why not have a new flag for bpf_redirect that transparently stores all
> > > > > >> available metadata? If you care only about the redirect -> skb case.
> > > > > >> Might give us more wiggle room in the future to make it work with
> > > > > >> traits.
> > > > > >
> > > > > > Also q from my side: If I understand the proposal correctly, in order to fully
> > > > > > populate an skb at some point, you have to call all the bpf_xdp_metadata_* kfuncs
> > > > > > to collect the data from the driver descriptors (indirect call), and then yet
> > > > > > again all equivalent bpf_xdp_store_rx_* kfuncs to re-store the data in struct
> > > > > > xdp_rx_meta again. This seems rather costly and once you add more kfuncs with
> > > > > > meta data aren't you better off switching to tc(x) directly so the driver can
> > > > > > do all this natively? :/
> > > > >
> > > > > I agree that the "one kfunc per metadata item" scales poorly. IIRC, the
> > > > > hope was (back when we added the initial HW metadata support) that we
> > > > > would be able to inline them to avoid the function call overhead.
> > > > >
> > > > > That being said, even with half a dozen function calls, that's still a
> > > > > lot less overhead from going all the way to TC(x). The goal of the use
> > > > > case here is to do as little work as possible on the CPU that initially
> > > > > receives the packet, instead moving the network stack processing (and
> > > > > skb allocation) to a different CPU with cpumap.
> > > > >
> > > > > So even if the *total* amount of work being done is a bit higher because
> > > > > of the kfunc overhead, that can still be beneficial because it's split
> > > > > between two (or more) CPUs.
> > > > >
> > > > > I'm sure Jesper has some concrete benchmarks for this lying around
> > > > > somewhere, hopefully he can share those :)
> > > >
> > > > Another possible approach would be to have some utility functions (not kfuncs)
> > > > used to 'store' the hw metadata in the xdp_frame that are executed in each
> > > > driver codebase before performing XDP_REDIRECT. The downside of this approach
> > > > is we need to parse the hw metadata twice if the eBPF program that is bounded
> > > > to the NIC is consuming these info. What do you think?
> > >
> > > That's the option I was asking about. I'm assuming we should be able
> > > to reuse existing xmo metadata callbacks for this. We should be able
> > > to hide it from the drivers also hopefully.
> >
> > If we move the hw metadata 'store' operations to the driver codebase (running
> > xmo metadata callbacks before performing XDP_REDIRECT), we will parse the hw
> > metadata twice if we attach to the NIC an AF_XDP program consuming the hw
> > metadata, right? One parsing is done by the AF_XDP hw metadata kfunc, and the
> > second one would be performed by the native driver codebase.
>
> The native driver codebase will parse the hw metadata only if the
> bpf_redirect set some flag, so unless I'm missing something, there
> should not be double parsing. (but it's all user controlled, so doesn't
> sound like a problem?)
I do not have a strong opinion about it, I guess it is fine, but I am not
100% sure if it fits in Jesper's use case.
@Jesper: any input on it?
Regards,
Lorenzo
>
> > Moreover, this approach seems less flexible. What do you think?
>
> Agreed on the flexibility. Just trying to understand whether we really
> need that flexibility. My worry is that we might expose too much of
> the stack's internals with this and introduce some unexpected
> dependencies. The things like Jesper mentioned in another thread:
> set skb->hash before redirect to make GRO go fast... We either have
> to make the stack more robust (my preference), or document these
> cases clearly and have test coverage to avoid breakage in the future.
Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)
Powered by blists - more mailing lists