lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 22 Feb 2024 15:23:36 -0500
From: Christian Hopps <chopps@...n.net>
To: Simon Horman <horms@...nel.org>
Cc: Christian Hopps <chopps@...pps.org>, devel@...ux-ipsec.org, Steffen
 Klassert <steffen.klassert@...unet.com>, netdev@...r.kernel.org, Christian
 Hopps <chopps@...n.net>
Subject: Re: [PATCH ipsec-next v1 8/8] iptfs: impl: add new iptfs xfrm mode
 impl


Simon Horman <horms@...nel.org> writes:

> On Mon, Feb 19, 2024 at 03:57:35AM -0500, Christian Hopps wrote:
>> From: Christian Hopps <chopps@...n.net>
>>
>> Add a new xfrm mode implementing AggFrag/IP-TFS from RFC9347.
>>
>> This utilizes the new xfrm_mode_cbs to implement demand-driven IP-TFS
>> functionality. This functionality can be used to increase bandwidth
>> utilization through small packet aggregation, as well as help solve PMTU
>> issues through it's efficient use of fragmentation.
>>
>> Link: https://www.rfc-editor.org/rfc/rfc9347.txt
>>
>> Signed-off-by: Christian Hopps <chopps@...n.net>
>
> ...
>
>> diff --git a/net/xfrm/xfrm_iptfs.c b/net/xfrm/xfrm_iptfs.c
>
> ...
>
>> +/**
>> + * skb_head_to_frag() - initialize a skb_frag_t based on skb head data
>> + * @skb: skb with the head data
>> + * @frag: frag to initialize
>> + */
>> +static void skb_head_to_frag(const struct sk_buff *skb, skb_frag_t *frag)
>> +{
>> +	struct page *page = virt_to_head_page(skb->data);
>> +	unsigned char *addr = (unsigned char *)page_address(page);
>> +
>> +	BUG_ON(!skb->head_frag);
>
> Is it strictly necessary to crash the Kernel here?
> Likewise, many other places in this patch.

In all use cases it represents a programming error (bug) if the condition is met.

What is the correct use of BUG_ON?

>> +	skb_frag_fill_page_desc(frag, page, skb->data - addr, skb_headlen(skb));
>> +}
>
> ...
>
>> +/**
>> + * skb_add_frags() - add a range of fragment references into an skb
>> + * @skb: skb to add references into
>> + * @walk: the walk to add referenced fragments from.
>> + * @offset: offset from beginning of original skb to start from.
>> + * @len: amount of data to add frag references to in @skb.
>> + *
>> + * skb_can_add_frags() should be called before this function to verify that the
>> + * destination @skb is compatible with the walk and has space in the array for
>> + * the to be added frag refrences.
>
> nit: references

Fixed.

>> + *
>> + * Return: The number of bytes not added to @skb b/c we reached the end of the
>> + * walk before adding all of @len.
>> + */
>
> ...
>
>> +/**
>> + * iptfs_reassem_done() - In-progress packet is aborted free the state.
>
> nit: This does not match the name of the function it documents.
>
>      Flagged by W=1 build with gcc-13.

Fixed.

>
>> + * @xtfs: xtfs state
>> + */
>> +static void iptfs_reassem_abort(struct xfrm_iptfs_data *xtfs)
>> +{
>> +	__iptfs_reassem_done(xtfs, true);
>> +}
>
> ...
>
>> +/**
>> + * iptfs_input_ordered() - handle next in order IPTFS payload.
>> + * @x: xfrm state
>> + * @skb: current packet
>> + *
>> + * Process the IPTFS payload in `skb` and consume it afterwards.
>> + */
>> +static int iptfs_input_ordered(struct xfrm_state *x, struct sk_buff *skb)
>> +{
>> +	u8 hbytes[sizeof(struct ipv6hdr)];
>> +	struct ip_iptfs_cc_hdr iptcch;
>> +	struct skb_seq_state skbseq;
>> +	struct skb_frag_walk _fragwalk;
>> +	struct skb_frag_walk *fragwalk = NULL;
>> +	struct list_head sublist; /* rename this it's just a list */
>> +	struct sk_buff *first_skb, *defer, *next;
>> +	const unsigned char *old_mac;
>> +	struct xfrm_iptfs_data *xtfs;
>> +	struct ip_iptfs_hdr *ipth;
>> +	struct iphdr *iph;
>> +	struct net *net;
>> +	u32 remaining, first_iplen, iplen, iphlen, data, tail;
>> +	u32 blkoff, capturelen;
>> +	u64 seq;
>> +
>> +	xtfs = x->mode_data;
>> +	net = dev_net(skb->dev);
>> +	first_skb = NULL;
>> +	defer = NULL;
>> +
>> +	seq = __esp_seq(skb);
>> +
>> +	/* Large enough to hold both types of header */
>> +	ipth = (struct ip_iptfs_hdr *)&iptcch;
>> +
>> +	/* Save the old mac header if set */
>> +	old_mac = skb_mac_header_was_set(skb) ? skb_mac_header(skb) : NULL;
>> +
>> +	skb_prepare_seq_read(skb, 0, skb->len, &skbseq);
>> +
>> +	/* Get the IPTFS header and validate it */
>> +
>> +	if (skb_copy_bits_seq(&skbseq, 0, ipth, sizeof(*ipth))) {
>> +		XFRM_INC_STATS(net, LINUX_MIB_XFRMINBUFFERERROR);
>> +		goto done;
>> +	}
>> +	data = sizeof(*ipth);
>> +
>> +	trace_iptfs_egress_recv(skb, xtfs, htons(ipth->block_offset));
>
> Maybe this is backwards, because the argument to htons should be
> in host byte order, but the type of ipth->block_offset is __be16.
>
> Also, personally, i would suggest using be16_to_cpu as it better
> describes the types involved.
>
> This is flagged by Sparse along with some other problems.
> Please take care not to introduce new Sparse warnings.

Cleaned these up. Switched to be16 macros..

> ...
>
>> +static u32 __reorder_future_shifts(struct xfrm_iptfs_data *xtfs,
>> +				   struct sk_buff *inskb,
>> +				   struct list_head *list,
>> +				   struct list_head *freelist, u32 *fcount)
>> +{
>> +	const u32 nslots = xtfs->cfg.reorder_win_size + 1;
>> +	const u64 inseq = __esp_seq(inskb);
>> +	u32 savedlen = xtfs->w_savedlen;
>> +	u64 wantseq = xtfs->w_wantseq;
>> +	struct sk_buff *slot0 = NULL;
>> +	u64 last_drop_seq = xtfs->w_wantseq;
>> +	u64 distance, extra_drops, missed, s0seq;
>
> Missed is set but otherwise unused in this function.
>
> Flagged by W=1 build with clang-17.

I've removed `missed`; however, it will be needed for congestion control if that gets implemented.

>
>> +	u32 count = 0;
>> +	struct skb_wseq *wnext;
>> +	u32 beyond, shifting, slot;
>> +
>> +	BUG_ON(inseq <= wantseq);
>> +	distance = inseq - wantseq;
>> +	BUG_ON(distance <= nslots - 1);
>> +	beyond = distance - (nslots - 1);
>> +	missed = 0;
>> +
>> +	/* Handle future sequence number received.
>> +	 *
>> +	 * IMPORTANT: we are at least advancing w_wantseq (i.e., wantseq) by 1
>> +	 * b/c we are beyond the window boundary.
>> +	 *
>> +	 * We know we don't have the wantseq so that counts as a drop.
>> +	 */
>> +
>> +	/* ex: slot count is 4, array size is 3 savedlen is 2, slot 0 is the
>> +	 * missing sequence number.
>> +	 *
>> +	 * the final slot at savedlen (index savedlen - 1) is always occupied.
>> +	 *
>> +	 * beyond is "beyond array size" not savedlen.
>> +	 *
>> +	 *          +--------- array length (savedlen == 2)
>> +	 *          |   +----- array size (nslots - 1 == 3)
>> +	 *          |   |   +- window boundary (nslots == 4)
>> +	 *          V   V | V
>> +	 *                |
>> +	 *  0   1   2   3 |   slot number
>> +	 * ---  0   1   2 |   array index
>> +	 *     [b] [c] : :|   array
>> +	 *                |
>> +	 * "2" "3" "4" "5"|*6*  seq numbers
>> +	 *
>> +	 * We receive seq number 6
>> +	 * distance == 4 [inseq(6) - w_wantseq(2)]
>> +	 * newslot == distance
>> +	 * index == 3 [distance(4) - 1]
>> +	 * beyond == 1 [newslot(4) - lastslot((nslots(4) - 1))]
>> +	 * shifting == 1 [min(savedlen(2), beyond(1)]
>> +	 * slot0_skb == [b], and should match w_wantseq
>> +	 *
>> +	 *                +--- window boundary (nslots == 4)
>> +	 *  0   1   2   3 | 4   slot number
>> +	 * ---  0   1   2 | 3   array index
>> +	 *     [b] : : : :|     array
>> +	 * "2" "3" "4" "5" *6*  seq numbers
>> +	 *
>> +	 * We receive seq number 6
>> +	 * distance == 4 [inseq(6) - w_wantseq(2)]
>> +	 * newslot == distance
>> +	 * index == 3 [distance(4) - 1]
>> +	 * beyond == 1 [newslot(4) - lastslot((nslots(4) - 1))]
>> +	 * shifting == 1 [min(savedlen(1), beyond(1)]
>> +	 * slot0_skb == [b] and should match w_wantseq
>> +	 *
>> +	 *                +-- window boundary (nslots == 4)
>> +	 *  0   1   2   3 | 4   5   6   slot number
>> +	 * ---  0   1   2 | 3   4   5   array index
>> +	 *     [-] [c] : :|             array
>> +	 * "2" "3" "4" "5" "6" "7" *8*  seq numbers
>> +	 *
>> +	 * savedlen = 2, beyond = 3
>> +	 * iter 1: slot0 == NULL, missed++, lastdrop = 2 (2+1-1), slot0 = [-]
>> +	 * iter 2: slot0 == NULL, missed++, lastdrop = 3 (2+2-1), slot0 = [c]
>> +	 * 2 < 3, extra = 1 (3-2), missed += extra, lastdrop = 4 (2+2+1-1)
>> +	 *
>> +	 * We receive seq number 8
>> +	 * distance == 6 [inseq(8) - w_wantseq(2)]
>> +	 * newslot == distance
>> +	 * index == 5 [distance(6) - 1]
>> +	 * beyond == 3 [newslot(6) - lastslot((nslots(4) - 1))]
>> +	 * shifting == 2 [min(savedlen(2), beyond(3)]
>> +	 *
>> +	 * slot0_skb == NULL changed from [b] when "savedlen < beyond" is true.
>> +	 */
>> +
>> +	/* Now send any packets that are being shifted out of saved, and account
>> +	 * for missing packets that are exiting the window as we shift it.
>> +	 */
>> +
>> +	/* If savedlen > beyond we are shifting some, else all. */
>> +	shifting = min(savedlen, beyond);
>> +
>> +	/* slot0 is the buf that just shifted out and into slot0 */
>> +	slot0 = NULL;
>> +	s0seq = wantseq;
>> +	last_drop_seq = s0seq;
>> +	wnext = xtfs->w_saved;
>> +	for (slot = 1; slot <= shifting; slot++, wnext++) {
>> +		/* handle what was in slot0 before we occupy it */
>> +		if (!slot0) {
>> +			last_drop_seq = s0seq;
>> +			missed++;
>> +		} else {
>> +			list_add_tail(&slot0->list, list);
>> +			count++;
>> +		}
>> +		s0seq++;
>> +		slot0 = wnext->skb;
>> +		wnext->skb = NULL;
>> +	}
>> +
>> +	/* slot0 is now either NULL (in which case it's what we now are waiting
>> +	 * for, or a buf in which case we need to handle it like we received it;
>> +	 * however, we may be advancing past that buffer as well..
>> +	 */
>> +
>> +	/* Handle case where we need to shift more than we had saved, slot0 will
>> +	 * be NULL iff savedlen is 0, otherwise slot0 will always be
>> +	 * non-NULL b/c we shifted the final element, which is always set if
>> +	 * there is any saved, into slot0.
>> +	 */
>> +	if (savedlen < beyond) {
>> +		extra_drops = beyond - savedlen;
>> +		if (savedlen == 0) {
>> +			BUG_ON(slot0);
>> +			s0seq += extra_drops;
>> +			last_drop_seq = s0seq - 1;
>> +		} else {
>> +			extra_drops--; /* we aren't dropping what's in slot0 */
>> +			BUG_ON(!slot0);
>> +			list_add_tail(&slot0->list, list);
>> +			/* if extra_drops then we are going past this slot0
>> +			 * so we can safely advance last_drop_seq
>> +			 */
>> +			if (extra_drops)
>> +				last_drop_seq = s0seq + extra_drops;
>> +			s0seq += extra_drops + 1;
>> +			count++;
>> +		}
>> +		missed += extra_drops;
>> +		slot0 = NULL;
>> +		/* slot0 has had an empty slot pushed into it */
>> +	}
>> +	(void)last_drop_seq;	/* we want this for CC code */
>> +
>> +	/* Remove the entries */
>> +	__vec_shift(xtfs, beyond);
>> +
>> +	/* Advance want seq */
>> +	xtfs->w_wantseq += beyond;
>> +
>> +	/* Process drops here when implementing congestion control */
>> +
>> +	/* We've shifted. plug the packet in at the end. */
>> +	xtfs->w_savedlen = nslots - 1;
>> +	xtfs->w_saved[xtfs->w_savedlen - 1].skb = inskb;
>> +	iptfs_set_window_drop_times(xtfs, xtfs->w_savedlen - 1);
>> +
>> +	/* if we don't have a slot0 then we must wait for it */
>> +	if (!slot0)
>> +		return count;
>> +
>> +	/* If slot0, seq must match new want seq */
>> +	BUG_ON(xtfs->w_wantseq != __esp_seq(slot0));
>> +
>> +	/* slot0 is valid, treat like we received expected. */
>> +	count += __reorder_this(xtfs, slot0, list);
>> +	return count;
>> +}
>
> ...
>
>> +/**
>> + * iptfs_get_mtu() - return the inner MTU for an IPTFS xfrm.
>
> nit: This does not match the name of the function it documents.

Fixed.

Thanks for your review!
Chris.

>> + * @x: xfrm state.
>> + * @outer_mtu: Outer MTU for the encapsulated packet.
>> + *
>> + * Return: Correct MTU taking in to account the encap overhead.
>> + */
>> +static u32 iptfs_get_inner_mtu(struct xfrm_state *x, int outer_mtu)
>
> ...

Download attachment "signature.asc" of type "application/pgp-signature" (858 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ