[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zq__9Z4ckXNdR-Ec@hog>
Date: Mon, 5 Aug 2024 00:25:57 +0200
From: Sabrina Dubroca <sd@...asysnail.net>
To: Christian Hopps <chopps@...pps.org>
Cc: devel@...ux-ipsec.org, Steffen Klassert <steffen.klassert@...unet.com>,
netdev@...r.kernel.org, Christian Hopps <chopps@...n.net>
Subject: Re: [PATCH ipsec-next v8 10/16] xfrm: iptfs: add fragmenting of
larger than MTU user packets
Please CC the reviewers from previous versions of the patchset. It's
really hard to keep track of discussions and reposts otherwise.
2024-08-04, 16:33:39 -0400, Christian Hopps wrote:
> From: Christian Hopps <chopps@...n.net>
>
> Add support for tunneling user (inner) packets that are larger than the
> tunnel's path MTU (outer) using IP-TFS fragmentation.
>
> Signed-off-by: Christian Hopps <chopps@...n.net>
> ---
> net/xfrm/xfrm_iptfs.c | 407 +++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 381 insertions(+), 26 deletions(-)
>
> diff --git a/net/xfrm/xfrm_iptfs.c b/net/xfrm/xfrm_iptfs.c
> index 20c19894720e..38735e2d64c3 100644
> --- a/net/xfrm/xfrm_iptfs.c
> +++ b/net/xfrm/xfrm_iptfs.c
> @@ -46,12 +46,23 @@
> */
> #define IPTFS_DEFAULT_MAX_QUEUE_SIZE (1024 * 10240)
>
> +/* 1) skb->head should be cache aligned.
> + * 2) when resv is for L2 headers (i.e., ethernet) we want the cacheline to
> + * start -16 from data.
> + * 3) when resv is for L3+L2 headers IOW skb->data points at the IPTFS payload
> + * we want data to be cache line aligned so all the pushed headers will be in
> + * another cacheline.
> + */
> +#define XFRM_IPTFS_MIN_L3HEADROOM 128
> +#define XFRM_IPTFS_MIN_L2HEADROOM (64 + 16)
How did you pick those values?
> +static struct sk_buff *iptfs_alloc_skb(struct sk_buff *tpl, u32 len,
> + bool l3resv)
> +{
> + struct sk_buff *skb;
> + u32 resv;
> +
> + if (!l3resv) {
> + resv = XFRM_IPTFS_MIN_L2HEADROOM;
> + } else {
> + resv = skb_headroom(tpl);
> + if (resv < XFRM_IPTFS_MIN_L3HEADROOM)
> + resv = XFRM_IPTFS_MIN_L3HEADROOM;
> + }
> +
> + skb = alloc_skb(len + resv, GFP_ATOMIC);
> + if (!skb) {
> + XFRM_INC_STATS(dev_net(tpl->dev), LINUX_MIB_XFRMNOSKBERROR);
Hmpf, so we've gone from incrementing the wrong counter to
incrementing a new counter that doesn't have a precise meaning.
> + return NULL;
> + }
> +
> + skb_reserve(skb, resv);
> +
> + /* We do not want any of the tpl->headers copied over, so we do
> + * not use `skb_copy_header()`.
> + */
This is a bit of a bad sign for the implementation. It also worries
me, as this may not be updated when changes are made to
__copy_skb_header().
(c/p'd from v1 review since this was still not answered)
> +/**
> + * skb_copy_bits_seq - copy bits from a skb_seq_state to kernel buffer
> + * @st: source skb_seq_state
> + * @offset: offset in source
> + * @to: destination buffer
> + * @len: number of bytes to copy
> + *
> + * Copy @len bytes from @offset bytes into the source @st to the destination
> + * buffer @to. `offset` should increase (or be unchanged) with each subsequent
> + * call to this function. If offset needs to decrease from the previous use `st`
> + * should be reset first.
> + *
> + * Return: 0 on success or a negative error code on failure
> + */
> +static int skb_copy_bits_seq(struct skb_seq_state *st, int offset, void *to,
> + int len)
Probably belongs in net/core/skbuff.c, although I'm really not
convinced copying data around is the right way to implement the type
of packet splitting IPTFS does (which sounds a bit like a kind of
GSO). And there are helpers in net/core/skbuff.c (such as
pskb_carve/pskb_extract) that seem to do similar things to what you
need here, without as much data copying.
> +static int iptfs_first_skb(struct sk_buff **skbp, struct xfrm_iptfs_data *xtfs,
> + u32 mtu)
> +{
> + struct sk_buff *skb = *skbp;
> + int err;
> +
> + /* Classic ESP skips the don't fragment ICMP error if DF is clear on
> + * the inner packet or ignore_df is set. Otherwise it will send an ICMP
> + * or local error if the inner packet won't fit it's MTU.
> + *
> + * With IPTFS we do not care about the inner packet DF bit. If the
> + * tunnel is configured to "don't fragment" we error back if things
> + * don't fit in our max packet size. Otherwise we iptfs-fragment as
> + * normal.
> + */
> +
> + /* The opportunity for HW offload has ended */
> + if (skb->ip_summed == CHECKSUM_PARTIAL) {
> + err = skb_checksum_help(skb);
> + if (err)
> + return err;
> + }
> +
> + /* We've split these up before queuing */
> + BUG_ON(skb_is_gso(skb));
As I've said previously, I don't think that's a valid reason to
crash. BUG_ON should be used very rarely:
https://elixir.bootlin.com/linux/v6.10/source/Documentation/process/coding-style.rst#L1230
Dropping a bogus packet is an easy way to recover from this situation,
so we should not crash here (and probably in all of IPTFS).
--
Sabrina
Powered by blists - more mailing lists