lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aHoZ-LtKT9p5FKAD@lore-desk>
Date: Fri, 18 Jul 2025 11:55:04 +0200
From: Lorenzo Bianconi <lorenzo@...nel.org>
To: Jesper Dangaard Brouer <hawk@...nel.org>
Cc: Jakub Kicinski <kuba@...nel.org>,
	Stanislav Fomichev <stfomichev@...il.com>, bpf@...r.kernel.org,
	netdev@...r.kernel.org, Alexei Starovoitov <ast@...nel.org>,
	Daniel Borkmann <borkmann@...earbox.net>,
	Eric Dumazet <eric.dumazet@...il.com>,
	"David S. Miller" <davem@...emloft.net>,
	Paolo Abeni <pabeni@...hat.com>, sdf@...ichev.me,
	kernel-team@...udflare.com, arthur@...hurfabre.com,
	jakub@...udflare.com, Jesse Brandeburg <jbrandeburg@...udflare.com>
Subject: Re: [PATCH bpf-next V2 0/7] xdp: Allow BPF to set RX hints for
 XDP_REDIRECTed packets

> 
> 
> On 16/07/2025 23.20, Jakub Kicinski wrote:
> > On Wed, 16 Jul 2025 13:17:53 +0200 Lorenzo Bianconi wrote:
> > > > > I can't see what the non-redirected use-case could be. Can you please provide
> > > > > more details?
> > > > > Moreover, can it be solved without storing the rx_hash (or the other
> > > > > hw-metadata) in a non-driver specific format?
> > > > 
> > > > Having setters feels more generic than narrowly solving only the redirect,
> > > > but I don't have a good use-case in mind.
> > > > > Storing the hw-metadata in some of hw-specific format in xdp_frame will not
> > > > > allow to consume them directly building the skb and we will require to decode
> > > > > them again. What is the upside/use-case of this approach? (not considering the
> > > > > orthogonality with the get method).
> > > > 
> > > > If we add the store kfuncs to regular drivers, the metadata  won't be stored
> > > > in the xdp_frame; it will go into the rx descriptors so regular path that
> > > > builds skbs will use it.
> > > 
> > > IIUC, the described use-case would be to modify the hw metadata via a
> > > 'setter' kfunc executed by an eBPF program bounded to the NIC and to store
> > > the new metadata in the DMA descriptor in order to be consumed by the driver
> > > codebase building the skb, right?
> > > If so:
> > > - we can get the same result just storing (running a kfunc) the modified hw
> > >    metadata in the xdp_buff struct using a well-known/generic layout and
> > >    consume it in the driver codebase (e.g. if the bounded eBPF program
> > >    returns XDP_PASS) using a generic xdp utility routine. This part is not in
> > >    the current series.
> > > - Using this approach we are still not preserving the hw metadata if we pass
> > >    the xdp_frame to a remote CPU returning XDP_REDIRCT (we need to add more
> > >    code)
> > > - I am not completely sure if can always modify the DMA descriptor directly
> > >    since it is DMA mapped.
> 
> Let me explain why it is a bad idea of writing into the RX descriptors.
> The DMA descriptors are allocated as coherent DMA (dma_alloc_coherent).
> This is memory that is shared with the NIC hardware device, which
> implies cache-line coherence.  NIC performance is tightly coupled to
> limiting cache misses for descriptors.  One common trick is to pack more
> descriptors into a single cache-line.  Thus, if we start to write into
> the current RX-descriptor, then we invalidate that cache-line seen from
> the device, and next RX-descriptor (from this cache-line) will be in an
> unfortunate coherent state.  Behind the scene this might lead to some
> extra PCIe transactions.
> 
> By writing to the xdp_frame, we don't have to modify the DMA descriptors
> directly and risk invalidating cache lines for the NIC.
> 
> > > 
> > > What do you think?
> > 
> > FWIW I commented on an earlier revision to similar effect as Stanislav.
> > To me the main concern is that we're adding another adhoc scheme, and
> > are making xdp_frame grow into a para-skb. We added XDP to make raw
> > packet access fast, now we're making drivers convert metadata twice :/
> 
> Thanks for the feedback. I can see why you'd be concerned about adding
> another adhoc scheme or making xdp_frame grow into a "para-skb".
> 
> However, I'd like to frame this as part of a long-term plan we've been
> calling the "mini-SKB" concept. This isn't a new idea, but a
> continuation of architectural discussions from as far back as [2016].
> 
> The long-term goal, described in these presentations from [2018] and
> [2019], has always been to evolve the xdp_frame to handle more hardware
> offloads, with the ultimate vision of moving SKB allocation out of NIC
> drivers entirely. In the future, the netstack could perform L3
> forwarding (and L2 bridging) directly on these enhanced xdp_frames
> [2019-slide20]. The main blocker for this vision has been the lack of
> hardware metadata in the xdp_frame.
> 
> This patchset is a small but necessary first step towards that goal. It
> focuses on the concrete XDP_REDIRECT use-case where we can immediately
> benefit for our production use-case. Storing this metadata in the
> xdp_frame is fundamental to the plan. It's no coincidence the fields are
> compatible with the SKB; they need to be.
> 
> I'm certainly open to debating the bigger picture, but I hope we can
> agree that it shouldn't hold up this first step, which solves an
> immediate need. Perhaps we can evaluate the merits of this specific
> change first, and discuss the overall architecture in parallel?

Considering the XDP_REDIRECT use-case, this series will allow us (in the
future) to avoid recomputing the packet checksum redirecting the frame into
a veth and then into a container, obtaining a significant performance
improvement.

Regarding,
Lorenzo

> 
> --Jesper
> 
> 
> Links:
> ------
> [2019] XDP closer integration with network stack
>  - https://people.netfilter.org/hawk/presentations/KernelRecipes2019/xdp-netstack-concert.pdf
>  - https://github.com/xdp-project/xdp-project/blob/main/conference/KernelRecipes2019/xdp-netstack-concert.org#slide-move-skb-allocations-out-of-nic-drivers
>  - [2019-slide20] https://github.com/xdp-project/xdp-project/blob/main/conference/KernelRecipes2019/xdp-netstack-concert.org#slide-fun-with-xdp_frame-before-skb-alloc
> 
> [2018] LPC Networking Track: XDP - challenges and future work
>  - https://people.netfilter.org/hawk/presentations/LinuxPlumbers2018/
>  - https://github.com/xdp-project/xdp-project/blob/main/conference/LinuxPlumbers2018/presentation-lpc2018-xdp-future.org#topic-moving-skb-allocation-out-of-driver
> 
> [2016] Network Performance Workshop
>  - https://people.netfilter.org/hawk/presentations/NetDev1.2_2016/net_performance_workshop_netdev1.2.pdf

Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ