netdev - Re: [PATCH ipsec-next v8 10/16] xfrm: iptfs: add fragmenting of larger than MTU user packets

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZrHjByjZnnDgjvfo@hog>
Date: Tue, 6 Aug 2024 10:47:03 +0200
From: Sabrina Dubroca <sd@...asysnail.net>
To: Christian Hopps <chopps@...pps.org>
Cc: devel@...ux-ipsec.org, Steffen Klassert <steffen.klassert@...unet.com>,
	netdev@...r.kernel.org, Christian Hopps <chopps@...n.net>
Subject: Re: [PATCH ipsec-next v8 10/16] xfrm: iptfs: add fragmenting of
 larger than MTU user packets

2024-08-04, 22:33:05 -0400, Christian Hopps wrote:
> > > +/* 1) skb->head should be cache aligned.
> > > + * 2) when resv is for L2 headers (i.e., ethernet) we want the cacheline to
> > > + * start -16 from data.
> > > + * 3) when resv is for L3+L2 headers IOW skb->data points at the IPTFS payload
> > > + * we want data to be cache line aligned so all the pushed headers will be in
> > > + * another cacheline.
> > > + */
> > > +#define XFRM_IPTFS_MIN_L3HEADROOM 128
> > > +#define XFRM_IPTFS_MIN_L2HEADROOM (64 + 16)
> > 
> > How did you pick those values?
> 
> That's what the comment is talking to. When reserving space for L2 headers we pick 64 + 16 (a 2^(<=6) cacheline + 16 bytes so the the cacheline should start -16 from where skb->data will point at.

Hard-coding the x86 cacheline size is not a good idea. And what's the
16B for? You don't know that it's enough for the actual L2 headers.

> For L3 we reserve double the power of 2 space we reserved for L2 only.

But that's the core of my question. Why is that correct/enough?

> 
> We have to reserve some amount of space for pushed headers, so the above made sense to me for good performance/cache locality.
> 
> > > +static struct sk_buff *iptfs_alloc_skb(struct sk_buff *tpl, u32 len,
> > > +				       bool l3resv)
> > > +{
> > > +	struct sk_buff *skb;
> > > +	u32 resv;
> > > +
> > > +	if (!l3resv) {
> > > +		resv = XFRM_IPTFS_MIN_L2HEADROOM;
> > > +	} else {
> > > +		resv = skb_headroom(tpl);
> > > +		if (resv < XFRM_IPTFS_MIN_L3HEADROOM)
> > > +			resv = XFRM_IPTFS_MIN_L3HEADROOM;
> > > +	}
> > > +
> > > +	skb = alloc_skb(len + resv, GFP_ATOMIC);
> > > +	if (!skb) {
> > > +		XFRM_INC_STATS(dev_net(tpl->dev), LINUX_MIB_XFRMNOSKBERROR);
> > 
> > Hmpf, so we've gone from incrementing the wrong counter to
> > incrementing a new counter that doesn't have a precise meaning.
> 
> The new "No SKB" counter is supposed to mean "couldn't get an SKB",
> given plenty of other errors are logged under "OutErr" or "InErr"
> i'm not sure what level of precision you're looking for here. :)

OutErr and InErr would be better than that new counter IMO.


> > > +		return NULL;
> > > +	}
> > > +
> > > +	skb_reserve(skb, resv);
> > > +
> > > +	/* We do not want any of the tpl->headers copied over, so we do
> > > +	 * not use `skb_copy_header()`.
> > > +	 */
> > 
> > This is a bit of a bad sign for the implementation. It also worries
> > me, as this may not be updated when changes are made to
> > __copy_skb_header().
> > (c/p'd from v1 review since this was still not answered)
> 
> I don't agree that this is a bad design at all, I'm curious what you think a good design to be.

Strange skb manipulations hiding in a protocol module is not good
design.

c/p bits of core code into a module (where they will never get fixed
up when the core code gets updated) is always a bad idea.


> I did specifically state why we are not re-using
> skb_copy_header(). The functionality is different. We are not trying
> to make a copy of an skb we are using an skb as a template for new
> skbs.

I saw that. That doesn't mean it's a good thing to do.

> > > +/**
> > > + * skb_copy_bits_seq - copy bits from a skb_seq_state to kernel buffer
> > > + * @st: source skb_seq_state
> > > + * @offset: offset in source
> > > + * @to: destination buffer
> > > + * @len: number of bytes to copy
> > > + *
> > > + * Copy @len bytes from @offset bytes into the source @st to the destination
> > > + * buffer @to. `offset` should increase (or be unchanged) with each subsequent
> > > + * call to this function. If offset needs to decrease from the previous use `st`
> > > + * should be reset first.
> > > + *
> > > + * Return: 0 on success or a negative error code on failure
> > > + */
> > > +static int skb_copy_bits_seq(struct skb_seq_state *st, int offset, void *to,
> > > +			     int len)
> > 
> > Probably belongs in net/core/skbuff.c, although I'm really not
> > convinced copying data around is the right way to implement the type
> > of packet splitting IPTFS does (which sounds a bit like a kind of
> > GSO). And there are helpers in net/core/skbuff.c (such as
> > pskb_carve/pskb_extract) that seem to do similar things to what you
> > need here, without as much data copying.
> 
> I don't have an issue with moving more general skb functionality
> into skbuff.c; however, I do not want to gate IP-TFS on this change
> to the general net infra, it is appropriate for a patchset of it's
> own.

If you need helpers that don't exist, it's part of your job to make
the core changes that are required to implement the functionality.

> Re copying: Let's be clear here, we are not always copying data,
> there are sharing code paths as well; however, there are times when
> it is the best (and even fastest) way to accomplish things (e.g.,
> b/c the packet is small or the data is arranged in skbs in a way to
> make sharing ridiculously complex and thus slow).

I'm not finding the sharing code. You mean iptfs_first_should_copy
returning false?

-- 
Sabrina