lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aYpXxAowNQe99cEm@boxer>
Date: Mon, 9 Feb 2026 22:55:16 +0100
From: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
To: "Nikhil P. Rao" <nikhil.rao@....com>
CC: <netdev@...r.kernel.org>, <magnus.karlsson@...el.com>, <sdf@...ichev.me>,
	<davem@...emloft.net>, <edumazet@...gle.com>, <kuba@...nel.org>,
	<pabeni@...hat.com>, <horms@...nel.org>, <kerneljasonxing@...il.com>
Subject: Re: [PATCH net v2 2/2] xsk: Fix zero-copy AF_XDP fragment drop

On Mon, Feb 09, 2026 at 06:24:51PM +0000, Nikhil P. Rao wrote:
> AF_XDP should ensure that only a complete packet is sent to application.
> In the zero-copy case, if the Rx queue gets full as fragments are being
> enqueued, the remaining fragments are dropped.

All of the descs that current xdp_buff was carrying will be dropped which
is incorrect as some of them have been exposed to Rx queue already and I
don't see the error path that would rewind them. So that's my
understanding of this issue.

However, we were trying to keep the single-buf case as fast as we can, see
below.

> 
> Add a check to ensure that the Rx queue has enough space for all
> fragments of a packet before starting to enqueue them.
> 
> Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX")
> Signed-off-by: Nikhil P. Rao <nikhil.rao@....com>
> ---
>  net/xdp/xsk.c | 22 +++++++++++-----------
>  1 file changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index f2ec4f78bbb6..b65be95abcdc 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -166,15 +166,20 @@ static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
>  	u32 frags = xdp_buff_has_frags(xdp);
>  	struct xdp_buff_xsk *pos, *tmp;
>  	struct list_head *xskb_list;
> +	u32 num_desc = 1;
>  	u32 contd = 0;
> -	int err;
>  
> -	if (frags)
> +	if (frags) {
> +		num_desc = xdp_get_shared_info_from_buff(xdp)->nr_frags + 1;
>  		contd = XDP_PKT_CONTD;
> +	}
>  
> -	err = __xsk_rcv_zc(xs, xskb, len, contd);
> -	if (err)
> -		goto err;
> +	if (xskq_prod_nb_free(xs->rx, num_desc) < num_desc) {

this will hurt single buf performance unfortunately, I'd rather have frag
part still executed separately. Did you measure what impact on throughput
this patch has?

Further thought here is once we are sure about sufficient space in xsk
queue then we could skip sanity check that xskq_prod_reserve_desc()
contains. Look at batching that is done on Tx side.

Please see what works best here. Whether keeping linear part execution
separate from frags + producing frags in a 'batched' way or including
linear part with this 'batched' production of descriptors.

> +		xs->rx_queue_full++;
> +		return -ENOBUFS;
> +	}
> +
> +	__xsk_rcv_zc(xs, xskb, len, contd);
>  	if (likely(!frags))
>  		return 0;
>  
> @@ -183,16 +188,11 @@ static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
>  		if (list_is_singular(xskb_list))
>  			contd = 0;
>  		len = pos->xdp.data_end - pos->xdp.data;
> -		err = __xsk_rcv_zc(xs, pos, len, contd);
> -		if (err)
> -			goto err;
> +		__xsk_rcv_zc(xs, pos, len, contd);
>  		list_del_init(&pos->list_node);
>  	}
>  
>  	return 0;
> -err:
> -	xsk_buff_free(xdp);
> -	return err;
>  }
>  
>  static void *xsk_copy_xdp_start(struct xdp_buff *from)
> -- 
> 2.43.0
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ