netdev - Re: [devel-ipsec] [PATCH ipsec-next v5 09/17] xfrm: iptfs: add user packet (tunnel ingress) handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <m2v813cho8.fsf@chopps.org>
Date: Wed, 17 Jul 2024 23:32:36 -0700
From: Christian Hopps <chopps@...pps.org>
To: Simon Horman <horms@...nel.org>
Cc: Christian Hopps <chopps@...pps.org>, Steffen Klassert
 <steffen.klassert@...unet.com>, netdev@...r.kernel.org, Christian Hopps
 <chopps@...n.net>, devel@...ux-ipsec.org
Subject: Re: [devel-ipsec] [PATCH ipsec-next v5 09/17] xfrm: iptfs: add user
 packet (tunnel ingress) handling


Christian Hopps <chopps@...pps.org> writes:

> [[PGP Signed Part:Good signature from 2E1D830ED7B83025 Christian Hopps <chopps@...il.com> (trust ultimate) created at 2024-07-17T22:55:46-0700 using RSA]]
>
> Simon Horman via Devel <devel@...ux-ipsec.org> writes:
>
>> On Sun, Jul 14, 2024 at 04:22:37PM -0400, Christian Hopps wrote:
>>> From: Christian Hopps <chopps@...n.net>
>>>
>>> Add tunnel packet output functionality. This is code handles
>>> the ingress to the tunnel.
>>>
>>> Signed-off-by: Christian Hopps <chopps@...n.net>
>>> ---
>>>  net/xfrm/xfrm_iptfs.c | 535 +++++++++++++++++++++++++++++++++++++++++-
>>>  1 file changed, 532 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/net/xfrm/xfrm_iptfs.c b/net/xfrm/xfrm_iptfs.c
>>
>> ...
>>
>>> +static int iptfs_get_cur_pmtu(struct xfrm_state *x,
>>> +			      struct xfrm_iptfs_data *xtfs, struct sk_buff *skb)
>>> +{
>>> +	struct xfrm_dst *xdst = (struct xfrm_dst *)skb_dst(skb);
>>> +	u32 payload_mtu = xtfs->payload_mtu;
>>> +	u32 pmtu = iptfs_get_inner_mtu(x, xdst->child_mtu_cached);
>>
>> Hi Christian,
>>
>> Please consider arranging local variable declarations in Networking
>> code in reverse xmas tree order - longest line to shortest.
>> I think that in this case that would involve separating the
>> declaration and assignment of pmtu.
>>
>> Edward Cree's tool can be helpful here:
>> https://github.com/ecree-solarflare/xmastree
>
> This does not appear to be a style that is required by the net/xfrm code. I
> verified this by running the above tool on the other files in net/xfrm/*.c. In
> this case I'd prefer to not increase the number of lines in the function in
> order to satisfy the optional style guideline.
>
>>> +
>>> +	if (payload_mtu && payload_mtu < pmtu)
>>> +		pmtu = payload_mtu;
>>> +
>>> +	return pmtu;
>>> +}
>>
>> ...
>>
>>> +/* IPv4/IPv6 packet ingress to IPTFS tunnel, arrange to send in IPTFS payload
>>> + * (i.e., aggregating or fragmenting as appropriate).
>>> + * This is set in dst->output for an SA.
>>> + */
>>> +static int iptfs_output_collect(struct net *net, struct sock *sk,
>>> +				struct sk_buff *skb)
>>> +{
>>> +	struct dst_entry *dst = skb_dst(skb);
>>> +	struct xfrm_state *x = dst->xfrm;
>>> +	struct xfrm_iptfs_data *xtfs = x->mode_data;
>>> +	struct sk_buff *segs, *nskb;
>>> +	u32 pmtu = 0;
>>> +	bool ok = true;
>>> +	bool was_gso;
>>> +
>>> +	/* We have hooked into dst_entry->output which means we have skipped the
>>> +	 * protocol specific netfilter (see xfrm4_output, xfrm6_output).
>>> +	 * when our timer runs we will end up calling xfrm_output directly on
>>> +	 * the encapsulated traffic.
>>> +	 *
>>> +	 * For both cases this is the NF_INET_POST_ROUTING hook which allows
>>> +	 * changing the skb->dst entry which then may not be xfrm based anymore
>>> +	 * in which case a REROUTED flag is set. and dst_output is called.
>>> +	 *
>>> +	 * For IPv6 we are also skipping fragmentation handling for local
>>> +	 * sockets, which may or may not be good depending on our tunnel DF
>>> +	 * setting. Normally with fragmentation supported we want to skip this
>>> +	 * fragmentation.
>>> +	 */
>>> +
>>> +	BUG_ON(!xtfs);
>>> +
>>> +	pmtu = iptfs_get_cur_pmtu(x, xtfs, skb);
>>> +
>>> +	/* Break apart GSO skbs. If the queue is nearing full then we want the
>>> +	 * accounting and queuing to be based on the individual packets not on the
>>> +	 * aggregate GSO buffer.
>>> +	 */
>>> +	was_gso = skb_is_gso(skb);
>>> +	if (!was_gso) {
>>> +		segs = skb;
>>> +	} else {
>>> +		segs = skb_gso_segment(skb, 0);
>>> +		if (IS_ERR_OR_NULL(segs)) {
>>> +			XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTERROR);
>>> +			kfree_skb(skb);
>>> +			return PTR_ERR(segs);
>>
>> This will return 0 is skb_gso_segment returns NULL,
>> which occurs if skb doesn't require segmentation.
>> Is that intentional?
>>
>> If so, I wonder if it would be slightly nicer
>> to use PTR_ERR_OR_ZERO() instead of PTR_ERR().
>>
>> Flagged by Smatch (suggestion is my own).
>>  (suggestion is my own)
>
> Thanks! Actually in the case of NULL I've changed the code to return -EINVAL as xfrm_output_gso() does.
>
>>> +		}
>>> +		consume_skb(skb);
>>> +		skb = NULL;
>>> +	}
>>> +
>>> +	/* We can be running on multiple cores and from the network softirq or
>>> +	 * from user context depending on where the packet is coming from.
>>> +	 */
>>> +	spin_lock_bh(&x->lock);
>>> +
>>> +	skb_list_walk_safe(segs, skb, nskb)
>>> +	{
>>> +		skb_mark_not_on_list(skb);
>>> +
>>> +		/* Once we drop due to no queue space we continue to drop the
>>> +		 * rest of the packets from that GRO.
>>> +		 */
>>> +		if (!ok) {
>>> +nospace:
>>> +			if (skb->dev)
>>> +				XFRM_INC_STATS(dev_net(skb->dev),
>>> +					       LINUX_MIB_XFRMOUTNOQSPACE);
>>> +			kfree_skb_reason(skb, SKB_DROP_REASON_FULL_RING);
>>> +			continue;
>>> +		}
>>> +
>>> +		/* Fragmenting handled in following commits. */
>>> +		if (iptfs_is_too_big(sk, skb, pmtu)) {
>>> +			kfree_skb_reason(skb, SKB_DROP_REASON_PKT_TOO_BIG);
>>> +			continue;
>>> +		}
>>> +
>>> +		/* Enqueue to send in tunnel */
>>> +		ok = iptfs_enqueue(xtfs, skb);
>>> +		if (!ok)
>>> +			goto nospace;
>>> +	}
>>> +
>>> +	/* Start a delay timer if we don't have one yet */
>>> +	if (!hrtimer_is_queued(&xtfs->iptfs_timer)) {
>>> +		hrtimer_start(&xtfs->iptfs_timer, xtfs->init_delay_ns,
>>> +			      IPTFS_HRTIMER_MODE);
>>> +		xtfs->iptfs_settime = ktime_get_raw_fast_ns();
>>> +	}
>>> +
>>> +	spin_unlock_bh(&x->lock);
>>> +	return 0;
>>> +}
>>
>> ...
>>
>>> +static enum hrtimer_restart iptfs_delay_timer(struct hrtimer *me)
>>> +{
>>> +	struct sk_buff_head list;
>>> +	struct xfrm_iptfs_data *xtfs;
>>> +	struct xfrm_state *x;
>>> +	time64_t settime;
>>> +
>>> +	xtfs = container_of(me, typeof(*xtfs), iptfs_timer);
>>> +	x = xtfs->x;
>>> +
>>> +	/* Process all the queued packets
>>> +	 *
>>> +	 * softirq execution order: timer > tasklet > hrtimer
>>> +	 *
>>> +	 * Network rx will have run before us giving one last chance to queue
>>> +	 * ingress packets for us to process and transmit.
>>> +	 */
>>> +
>>> +	spin_lock(&x->lock);
>>> +	__skb_queue_head_init(&list);
>>> +	skb_queue_splice_init(&xtfs->queue, &list);
>>> +	xtfs->queue_size = 0;
>>> +	settime = xtfs->iptfs_settime;
>>
>> nit: settime is set but otherwise unused in this function.
>>
>>      Flagged by W=1 x86_64 allmodconfig builds with gcc-14 and clang-18.
>>
>
> Hmm, this value is in fact used inside a trace point function call in this function.


Realized this is another artifact of splitting the commit, I've moved settime to the later tracepoint commit now.

Thanks,
Chris.

>
>>> +	spin_unlock(&x->lock);
>>> +
>>> +	/* After the above unlock, packets can begin queuing again, and the
>>> +	 * timer can be set again, from another CPU either in softirq or user
>>> +	 * context (not from this one since we are running at softirq level
>>> +	 * already).
>>> +	 */
>>> +
>>> +	iptfs_output_queued(x, &list);
>>> +
>>> +	return HRTIMER_NORESTART;
>>> +}
>>
>> ...
>>
>>> @@ -98,10 +607,23 @@ static int iptfs_copy_to_user(struct xfrm_state *x, struct sk_buff *skb)
>>>  {
>>>  	struct xfrm_iptfs_data *xtfs = x->mode_data;
>>>  	struct xfrm_iptfs_config *xc = &xtfs->cfg;
>>> -	int ret = 0;
>>> +	int ret;
>>> +	u64 q;
>>> +
>>> +	if (x->dir == XFRM_SA_DIR_OUT) {
>>> +		q = xtfs->init_delay_ns;
>>> +		(void)do_div(q, NSECS_IN_USEC);
>>> +		ret = nla_put_u32(skb, XFRMA_IPTFS_INIT_DELAY, q);
>>> +		if (ret)
>>> +			return ret;
>>> +
>>> +		ret = nla_put_u32(skb, XFRMA_IPTFS_MAX_QSIZE,
>>> +				  xc->max_queue_size);
>>> +		if (ret)
>>> +			return ret;
>>>
>>> -	if (x->dir == XFRM_SA_DIR_OUT)
>>>  		ret = nla_put_u32(skb, XFRMA_IPTFS_PKT_SIZE, xc->pkt_size);
>>> +	}
>>
>> ret will be used uninitialised here unless the if condition above is true.
>>
>> Flagged by W=1 x86_64 allmodconfig build with clang-18, and Smatch.
>
> Sigh, this is an artifact of splitting up the new file into multiple
> functionality commits. The final resulting code does not leave the value
> uninitialized. In any case I will fix it in this middle commit too.
>
> Thanks!
> Chris.
>
>>
>>>
>>>  	return ret;
>>>  }
>> ...
>
> [[End of PGP Signed Part]]


Download attachment "signature.asc" of type "application/pgp-signature" (858 bytes)