netdev - Re: [PATCH v17 bpf-next 12/23] bpf: add multi-buff support to the bpf_xdp_adjust

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YYl1P+nPSuMjI+e6@lore-desk>
Date:   Mon, 8 Nov 2021 20:06:39 +0100
From:   Lorenzo Bianconi <lorenzo@...nel.org>
To:     Toke Høiland-Jørgensen <toke@...hat.com>
Cc:     Jakub Kicinski <kuba@...nel.org>, bpf@...r.kernel.org,
        netdev@...r.kernel.org, lorenzo.bianconi@...hat.com,
        davem@...emloft.net, ast@...nel.org, daniel@...earbox.net,
        shayagr@...zon.com, john.fastabend@...il.com, dsahern@...nel.org,
        brouer@...hat.com, echaudro@...hat.com, jasowang@...hat.com,
        alexander.duyck@...il.com, saeed@...nel.org,
        maciej.fijalkowski@...el.com, magnus.karlsson@...el.com,
        tirthendu.sarkar@...el.com
Subject: Re: [PATCH v17 bpf-next 12/23] bpf: add multi-buff support to the
 bpf_xdp_adjust_tail() API

> Lorenzo Bianconi <lorenzo@...nel.org> writes:
> 
> >> On Thu,  4 Nov 2021 18:35:32 +0100 Lorenzo Bianconi wrote:
> >> > This change adds support for tail growing and shrinking for XDP multi-buff.
> >> > 
> >> > When called on a multi-buffer packet with a grow request, it will always
> >> > work on the last fragment of the packet. So the maximum grow size is the
> >> > last fragments tailroom, i.e. no new buffer will be allocated.
> >> > 
> >> > When shrinking, it will work from the last fragment, all the way down to
> >> > the base buffer depending on the shrinking size. It's important to mention
> >> > that once you shrink down the fragment(s) are freed, so you can not grow
> >> > again to the original size.
> >> 
> >> > +static int bpf_xdp_mb_increase_tail(struct xdp_buff *xdp, int offset)
> >> > +{
> >> > +	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
> >> > +	skb_frag_t *frag = &sinfo->frags[sinfo->nr_frags - 1];
> >> > +	int size, tailroom;
> >> > +
> >> > +	tailroom = xdp->frame_sz - skb_frag_size(frag) - skb_frag_off(frag);
> >> 
> >> I know I complained about this before but the assumption that we can
> >> use all the space up to xdp->frame_sz makes me uneasy.
> >> 
> >> Drivers may not expect the idea that core may decide to extend the 
> >> last frag.. I don't think the skb path would ever do this.
> >> 
> >> How do you feel about any of these options: 
> >>  - dropping this part for now (return an error for increase)
> >>  - making this an rxq flag or reading the "reserved frag size"
> >>    from rxq (so that drivers explicitly opt-in)
> >>  - adding a test that can be run on real NICs
> >> ?
> >
> > I think this has been added to be symmetric with bpf_xdp_adjust_tail().
> > I do think there is a real use-case for it so far so I am fine to just
> > support the shrink part.
> >
> > @Eelco, Jesper, Toke: any comments on it?
> 
> Well, tail adjust is useful for things like encapsulations that need to
> add a trailer. Don't see why that wouldn't be something people would
> want to do for jumboframes as well?
> 

I agree this would be useful for protocols that add a trailer.

> Not sure I get what the issue is with this either? But having a test
> that can be run to validate this on hardware would be great in any case,
> I suppose - we've been discussing more general "compliance tests" for
> XDP before...

what about option 2? We can add a frag_size field to rxq [0] that is set by
the driver initializing the xdp_buff. frag_size set to 0 means we can use
all the buffer.

Regards,
Lorenzo

[0] pahole -C xdp_rxq_info vmlinux
struct xdp_rxq_info {
	struct net_device *        dev;                  /*     0     8 */
	u32                        queue_index;          /*     8     4 */
	u32                        reg_state;            /*    12     4 */
	struct xdp_mem_info        mem;                  /*    16     8 */
	unsigned int               napi_id;              /*    24     4 */

	/* size: 64, cachelines: 1, members: 5 */
	/* padding: 36 */
} __attribute__((__aligned__(64)));

> 
> -Toke
> 

Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)