linux-kernel - Re: [PATCH 1/2] xen/netback: don't do grant copy across page boundary

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <59d90811-bc68-83cd-b7e5-7a8c2e2370d9@suse.com>
Date:   Mon, 27 Mar 2023 11:49:27 +0200
From:   Jan Beulich <jbeulich@...e.com>
To:     Juergen Gross <jgross@...e.com>
Cc:     Wei Liu <wei.liu@...nel.org>, Paul Durrant <paul@....org>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        xen-devel@...ts.xenproject.org, stable@...r.kernel.org,
        linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH 1/2] xen/netback: don't do grant copy across page boundary

On 27.03.2023 10:36, Juergen Gross wrote:
> Fix xenvif_get_requests() not to do grant copy operations across local
> page boundaries. This requires to double the maximum number of copy
> operations per queue, as each copy could now be split into 2.
> 
> Make sure that struct xenvif_tx_cb doesn't grow too large.
> 
> Cc: stable@...r.kernel.org
> Fixes: ad7f402ae4f4 ("xen/netback: Ensure protocol headers don't fall in the non-linear area")
> Signed-off-by: Juergen Gross <jgross@...e.com>
> ---
>  drivers/net/xen-netback/common.h  |  2 +-
>  drivers/net/xen-netback/netback.c | 25 +++++++++++++++++++++++--
>  2 files changed, 24 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
> index 3dbfc8a6924e..1fcbd83f7ff2 100644
> --- a/drivers/net/xen-netback/common.h
> +++ b/drivers/net/xen-netback/common.h
> @@ -166,7 +166,7 @@ struct xenvif_queue { /* Per-queue data for xenvif */
>  	struct pending_tx_info pending_tx_info[MAX_PENDING_REQS];
>  	grant_handle_t grant_tx_handle[MAX_PENDING_REQS];
>  
> -	struct gnttab_copy tx_copy_ops[MAX_PENDING_REQS];
> +	struct gnttab_copy tx_copy_ops[2 * MAX_PENDING_REQS];
>  	struct gnttab_map_grant_ref tx_map_ops[MAX_PENDING_REQS];
>  	struct gnttab_unmap_grant_ref tx_unmap_ops[MAX_PENDING_REQS];
>  	/* passed to gnttab_[un]map_refs with pages under (un)mapping */
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index 1b42676ca141..111c179f161b 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -334,6 +334,7 @@ static int xenvif_count_requests(struct xenvif_queue *queue,
>  struct xenvif_tx_cb {
>  	u16 copy_pending_idx[XEN_NETBK_LEGACY_SLOTS_MAX + 1];
>  	u8 copy_count;
> +	u32 split_mask;
>  };
>  
>  #define XENVIF_TX_CB(skb) ((struct xenvif_tx_cb *)(skb)->cb)
> @@ -361,6 +362,8 @@ static inline struct sk_buff *xenvif_alloc_skb(unsigned int size)
>  	struct sk_buff *skb =
>  		alloc_skb(size + NET_SKB_PAD + NET_IP_ALIGN,
>  			  GFP_ATOMIC | __GFP_NOWARN);
> +
> +	BUILD_BUG_ON(sizeof(*XENVIF_TX_CB(skb)) > sizeof(skb->cb));
>  	if (unlikely(skb == NULL))
>  		return NULL;
>  
> @@ -396,11 +399,13 @@ static void xenvif_get_requests(struct xenvif_queue *queue,
>  	nr_slots = shinfo->nr_frags + 1;
>  
>  	copy_count(skb) = 0;
> +	XENVIF_TX_CB(skb)->split_mask = 0;
>  
>  	/* Create copy ops for exactly data_len bytes into the skb head. */
>  	__skb_put(skb, data_len);
>  	while (data_len > 0) {
>  		int amount = data_len > txp->size ? txp->size : data_len;
> +		bool split = false;
>  
>  		cop->source.u.ref = txp->gref;
>  		cop->source.domid = queue->vif->domid;
> @@ -413,6 +418,13 @@ static void xenvif_get_requests(struct xenvif_queue *queue,
>  		cop->dest.u.gmfn = virt_to_gfn(skb->data + skb_headlen(skb)
>  				               - data_len);
>  
> +		/* Don't cross local page boundary! */
> +		if (cop->dest.offset + amount > XEN_PAGE_SIZE) {
> +			amount = XEN_PAGE_SIZE - cop->dest.offset;
> +			XENVIF_TX_CB(skb)->split_mask |= 1U << copy_count(skb);

Maybe worthwhile to add a BUILD_BUG_ON() somewhere to make sure this
shift won't grow too large a shift count. The number of slots accepted
could conceivably be grown past XEN_NETBK_LEGACY_SLOTS_MAX (i.e.
XEN_NETIF_NR_SLOTS_MIN) at some point.

> +			split = true;
> +		}
> +
>  		cop->len = amount;
>  		cop->flags = GNTCOPY_source_gref;
>  
> @@ -420,7 +432,8 @@ static void xenvif_get_requests(struct xenvif_queue *queue,
>  		pending_idx = queue->pending_ring[index];
>  		callback_param(queue, pending_idx).ctx = NULL;
>  		copy_pending_idx(skb, copy_count(skb)) = pending_idx;
> -		copy_count(skb)++;
> +		if (!split)
> +			copy_count(skb)++;
>  
>  		cop++;
>  		data_len -= amount;
> @@ -441,7 +454,8 @@ static void xenvif_get_requests(struct xenvif_queue *queue,
>  			nr_slots--;
>  		} else {
>  			/* The copy op partially covered the tx_request.
> -			 * The remainder will be mapped.
> +			 * The remainder will be mapped or copied in the next
> +			 * iteration.
>  			 */
>  			txp->offset += amount;
>  			txp->size -= amount;
> @@ -539,6 +553,13 @@ static int xenvif_tx_check_gop(struct xenvif_queue *queue,
>  		pending_idx = copy_pending_idx(skb, i);
>  
>  		newerr = (*gopp_copy)->status;
> +
> +		/* Split copies need to be handled together. */
> +		if (XENVIF_TX_CB(skb)->split_mask & (1U << i)) {
> +			(*gopp_copy)++;
> +			if (!newerr)
> +				newerr = (*gopp_copy)->status;
> +		}

It isn't guaranteed that a slot may be split only once, is it? Assuming a
near-64k packet with all tiny non-primary slots, that'll cause those tiny
slots to all be mapped, but due to

		if (ret >= XEN_NETBK_LEGACY_SLOTS_MAX - 1 && data_len < txreq.size)
			data_len = txreq.size;

will, afaict, cause a lot of copying for the primary slot. Therefore I
think you need a loop here, not just an if(). Plus tx_copy_ops[]'es
dimension also looks to need further growing to accommodate this. Or
maybe not - at least the extreme example given would still be fine; more
generally packets being limited to below 64k means 2*16 slots would
suffice at one end of the scale, while 2*MAX_PENDING_REQS would at the
other end (all tiny, including the primary slot). What I haven't fully
convinced myself of is whether there might be cases in the middle which
are yet worse.

As I've been struggling with the code fragment quoted above already in
the patch originally introducing it, I'd like to see that relaxed. Can't
we avoid excessive copying by suitably growing tx_map_ops[] and then
deleting that bumping of data_len? Then there also wouldn't be the risk
multiple splits per copy anymore.

Alternatively to all of the above: Am I overlooking a check somewhere
which would also constrain the primary slot (or more precisely its
residual) to within a single page (along the lines of the check for
non-primary slots in xenvif_count_requests())?

Jan