netdev - Re: [PATCH ipsec-next v8 10/16] xfrm: iptfs: add fragmenting of larger than MTU user packets

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Zq__9Z4ckXNdR-Ec@hog>
Date: Mon, 5 Aug 2024 00:25:57 +0200
From: Sabrina Dubroca <sd@...asysnail.net>
To: Christian Hopps <chopps@...pps.org>
Cc: devel@...ux-ipsec.org, Steffen Klassert <steffen.klassert@...unet.com>,
	netdev@...r.kernel.org, Christian Hopps <chopps@...n.net>
Subject: Re: [PATCH ipsec-next v8 10/16] xfrm: iptfs: add fragmenting of
 larger than MTU user packets

Please CC the reviewers from previous versions of the patchset. It's
really hard to keep track of discussions and reposts otherwise.


2024-08-04, 16:33:39 -0400, Christian Hopps wrote:
> From: Christian Hopps <chopps@...n.net>
> 
> Add support for tunneling user (inner) packets that are larger than the
> tunnel's path MTU (outer) using IP-TFS fragmentation.
> 
> Signed-off-by: Christian Hopps <chopps@...n.net>
> ---
>  net/xfrm/xfrm_iptfs.c | 407 +++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 381 insertions(+), 26 deletions(-)
> 
> diff --git a/net/xfrm/xfrm_iptfs.c b/net/xfrm/xfrm_iptfs.c
> index 20c19894720e..38735e2d64c3 100644
> --- a/net/xfrm/xfrm_iptfs.c
> +++ b/net/xfrm/xfrm_iptfs.c
> @@ -46,12 +46,23 @@
>   */
>  #define IPTFS_DEFAULT_MAX_QUEUE_SIZE	(1024 * 10240)
>  
> +/* 1) skb->head should be cache aligned.
> + * 2) when resv is for L2 headers (i.e., ethernet) we want the cacheline to
> + * start -16 from data.
> + * 3) when resv is for L3+L2 headers IOW skb->data points at the IPTFS payload
> + * we want data to be cache line aligned so all the pushed headers will be in
> + * another cacheline.
> + */
> +#define XFRM_IPTFS_MIN_L3HEADROOM 128
> +#define XFRM_IPTFS_MIN_L2HEADROOM (64 + 16)

How did you pick those values?



> +static struct sk_buff *iptfs_alloc_skb(struct sk_buff *tpl, u32 len,
> +				       bool l3resv)
> +{
> +	struct sk_buff *skb;
> +	u32 resv;
> +
> +	if (!l3resv) {
> +		resv = XFRM_IPTFS_MIN_L2HEADROOM;
> +	} else {
> +		resv = skb_headroom(tpl);
> +		if (resv < XFRM_IPTFS_MIN_L3HEADROOM)
> +			resv = XFRM_IPTFS_MIN_L3HEADROOM;
> +	}
> +
> +	skb = alloc_skb(len + resv, GFP_ATOMIC);
> +	if (!skb) {
> +		XFRM_INC_STATS(dev_net(tpl->dev), LINUX_MIB_XFRMNOSKBERROR);

Hmpf, so we've gone from incrementing the wrong counter to
incrementing a new counter that doesn't have a precise meaning.

> +		return NULL;
> +	}
> +
> +	skb_reserve(skb, resv);
> +
> +	/* We do not want any of the tpl->headers copied over, so we do
> +	 * not use `skb_copy_header()`.
> +	 */

This is a bit of a bad sign for the implementation. It also worries
me, as this may not be updated when changes are made to
__copy_skb_header().
(c/p'd from v1 review since this was still not answered)


> +/**
> + * skb_copy_bits_seq - copy bits from a skb_seq_state to kernel buffer
> + * @st: source skb_seq_state
> + * @offset: offset in source
> + * @to: destination buffer
> + * @len: number of bytes to copy
> + *
> + * Copy @len bytes from @offset bytes into the source @st to the destination
> + * buffer @to. `offset` should increase (or be unchanged) with each subsequent
> + * call to this function. If offset needs to decrease from the previous use `st`
> + * should be reset first.
> + *
> + * Return: 0 on success or a negative error code on failure
> + */
> +static int skb_copy_bits_seq(struct skb_seq_state *st, int offset, void *to,
> +			     int len)

Probably belongs in net/core/skbuff.c, although I'm really not
convinced copying data around is the right way to implement the type
of packet splitting IPTFS does (which sounds a bit like a kind of
GSO). And there are helpers in net/core/skbuff.c (such as
pskb_carve/pskb_extract) that seem to do similar things to what you
need here, without as much data copying.


> +static int iptfs_first_skb(struct sk_buff **skbp, struct xfrm_iptfs_data *xtfs,
> +			   u32 mtu)
> +{
> +	struct sk_buff *skb = *skbp;
> +	int err;
> +
> +	/* Classic ESP skips the don't fragment ICMP error if DF is clear on
> +	 * the inner packet or ignore_df is set. Otherwise it will send an ICMP
> +	 * or local error if the inner packet won't fit it's MTU.
> +	 *
> +	 * With IPTFS we do not care about the inner packet DF bit. If the
> +	 * tunnel is configured to "don't fragment" we error back if things
> +	 * don't fit in our max packet size. Otherwise we iptfs-fragment as
> +	 * normal.
> +	 */
> +
> +	/* The opportunity for HW offload has ended */
> +	if (skb->ip_summed == CHECKSUM_PARTIAL) {
> +		err = skb_checksum_help(skb);
> +		if (err)
> +			return err;
> +	}
> +
> +	/* We've split these up before queuing */
> +	BUG_ON(skb_is_gso(skb));

As I've said previously, I don't think that's a valid reason to
crash. BUG_ON should be used very rarely:

https://elixir.bootlin.com/linux/v6.10/source/Documentation/process/coding-style.rst#L1230

Dropping a bogus packet is an easy way to recover from this situation,
so we should not crash here (and probably in all of IPTFS).

-- 
Sabrina