lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aRcoGvqbT9V/HtoD@boxer>
Date: Fri, 14 Nov 2025 14:01:14 +0100
From: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
To: Alessandro Decina <alessandro.d@...il.com>
CC: <netdev@...r.kernel.org>, "David S. Miller" <davem@...emloft.net>, "Alexei
 Starovoitov" <ast@...nel.org>, Andrew Lunn <andrew+netdev@...n.ch>, "Daniel
 Borkmann" <daniel@...earbox.net>, Eric Dumazet <edumazet@...gle.com>, "Jakub
 Kicinski" <kuba@...nel.org>, Jesper Dangaard Brouer <hawk@...nel.org>, "John
 Fastabend" <john.fastabend@...il.com>, Paolo Abeni <pabeni@...hat.com>,
	Przemek Kitszel <przemyslaw.kitszel@...el.com>, Stanislav Fomichev
	<sdf@...ichev.me>, Tirthendu Sarkar <tirthendu.sarkar@...el.com>, Tony Nguyen
	<anthony.l.nguyen@...el.com>, <bpf@...r.kernel.org>,
	<intel-wired-lan@...ts.osuosl.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net v3 1/1] i40e: xsk: advance next_to_clean on status
 descriptors

On Thu, Nov 13, 2025 at 07:24:38PM +1100, Alessandro Decina wrote:
> Whenever a status descriptor is received, i40e processes and skips over
> it, correctly updating next_to_process but forgetting to update
> next_to_clean. In the next iteration this accidentally causes the
> creation of an invalid multi-buffer xdp_buff where the first fragment
> is the status descriptor.
> 
> If then a skb is constructed from such an invalid buffer - because the
> eBPF program returns XDP_PASS - a panic occurs:
> 
> [ 5866.367317] BUG: unable to handle page fault for address: ffd31c37eab1c980
> [ 5866.375050] #PF: supervisor read access in kernel mode
> [ 5866.380825] #PF: error_code(0x0000) - not-present page
> [ 5866.386602] PGD 0
> [ 5866.388867] Oops: Oops: 0000 [#1] SMP NOPTI
> [ 5866.393575] CPU: 34 UID: 0 PID: 0 Comm: swapper/34 Not tainted 6.17.0-custom #1 PREEMPT(voluntary)
> [ 5866.403740] Hardware name: Supermicro AS -2115GT-HNTR/H13SST-G, BIOS 3.2 03/20/2025
> [ 5866.412339] RIP: 0010:memcpy+0x8/0x10
> [ 5866.416454] Code: cc cc 90 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 48 89 f8 48 89 d1 <f3> a4 e9 fc 26 c0 fe 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
> [ 5866.437538] RSP: 0018:ff428d9ec0bb0ca8 EFLAGS: 00010286
> [ 5866.443415] RAX: ff2dd26dbd8f0000 RBX: ff2dd265ad161400 RCX: 00000000000004e1
> [ 5866.451435] RDX: 00000000000004e1 RSI: ffd31c37eab1c980 RDI: ff2dd26dbd8f0000
> [ 5866.459454] RBP: ff428d9ec0bb0d40 R08: 0000000000000000 R09: 0000000000000000
> [ 5866.467470] R10: 0000000000000000 R11: 0000000000000000 R12: ff428d9eec726ef8
> [ 5866.475490] R13: ff2dd26dbd8f0000 R14: ff2dd265ca2f9fc0 R15: ff2dd26548548b80
> [ 5866.483509] FS:  0000000000000000(0000) GS:ff2dd2c363592000(0000) knlGS:0000000000000000
> [ 5866.492600] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 5866.499060] CR2: ffd31c37eab1c980 CR3: 0000000178d7b040 CR4: 0000000000f71ef0
> [ 5866.507079] PKRU: 55555554
> [ 5866.510125] Call Trace:
> [ 5866.512867]  <IRQ>
> [ 5866.515132]  ? i40e_clean_rx_irq_zc+0xc50/0xe60 [i40e]
> [ 5866.520921]  i40e_napi_poll+0x2d8/0x1890 [i40e]
> [ 5866.526022]  ? srso_alias_return_thunk+0x5/0xfbef5
> [ 5866.531408]  ? raise_softirq+0x24/0x70
> [ 5866.535623]  ? srso_alias_return_thunk+0x5/0xfbef5
> [ 5866.541011]  ? srso_alias_return_thunk+0x5/0xfbef5
> [ 5866.546397]  ? rcu_sched_clock_irq+0x225/0x1800
> [ 5866.551493]  __napi_poll+0x30/0x230
> [ 5866.555423]  net_rx_action+0x20b/0x3f0
> [ 5866.559643]  handle_softirqs+0xe4/0x340
> [ 5866.563962]  __irq_exit_rcu+0x10e/0x130
> [ 5866.568283]  irq_exit_rcu+0xe/0x20
> [ 5866.572110]  common_interrupt+0xb6/0xe0
> [ 5866.576425]  </IRQ>
> [ 5866.578791]  <TASK>
> 
> Advance next_to_clean to ensure invalid xdp_buff(s) aren't created.
> 
> Rename i40e_inc_ntp to i40e_inc_ntp_ntc. Make it take an optional
> pointer to next_to_clean so it's harder for callers to accidentally
> forget to advance it.
> 
> Fixes: 1c9ba9c14658 ("i40e: xsk: add RX multi-buffer support")
> Signed-off-by: Alessandro Decina <alessandro.d@...il.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 33 ++++++++++++-------
>  .../ethernet/intel/i40e/i40e_txrx_common.h    |  2 ++
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 17 ++++++----
>  3 files changed, 34 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index cc0b9efc2637..d3dae895a058 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -2359,15 +2359,24 @@ void i40e_finalize_xdp_rx(struct i40e_ring *rx_ring, unsigned int xdp_res)
>  }
>  
>  /**
> - * i40e_inc_ntp: Advance the next_to_process index
> + * i40e_inc_ntp_ntc: Advance the next_to_process and next_to_clean indexes
>   * @rx_ring: Rx ring
> + * @next_to_process: Pointer to next_to_process
> + * @next_to_clean: Pointer to next_to_clean or NULL
> + *
> + * This function advances the next_to_process index. If next_to_clean is not
> + * NULL, it is advanced as well.
>   **/
> -static void i40e_inc_ntp(struct i40e_ring *rx_ring)
> +void i40e_inc_ntp_ntc(struct i40e_ring *rx_ring, u16 *next_to_process,
> +		      u16 *next_to_clean)
>  {
> -	u32 ntp = rx_ring->next_to_process + 1;
> +	u16 ntp = *next_to_process + 1;
>  
>  	ntp = (ntp < rx_ring->count) ? ntp : 0;
> -	rx_ring->next_to_process = ntp;
> +	*next_to_process = ntp;
> +	if (next_to_clean)
> +		*next_to_clean = ntp;
> +
>  	prefetch(I40E_RX_DESC(rx_ring, ntp));
>  }
>  
> @@ -2484,17 +2493,19 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget,
>  			i40e_clean_programming_status(rx_ring,
>  						      rx_desc->raw.qword[0],
>  						      qword);
> +			bool eop;
> +
>  			rx_buffer = i40e_rx_bi(rx_ring, ntp);
> -			i40e_inc_ntp(rx_ring);
> -			i40e_reuse_rx_page(rx_ring, rx_buffer);
>  			/* Update ntc and bump cleaned count if not in the
>  			 * middle of mb packet.
>  			 */
> -			if (rx_ring->next_to_clean == ntp) {
> -				rx_ring->next_to_clean =
> -					rx_ring->next_to_process;
> +			eop = rx_ring->next_to_process ==
> +			      rx_ring->next_to_clean;
> +			i40e_inc_ntp_ntc(rx_ring, &rx_ring->next_to_process,
> +					 eop ? &rx_ring->next_to_clean : NULL);

Woah, that's not what I had on mind...I meant to pull whole block that
takes care of FDIR descriptors onto common function. That logic should be
shared between normal Rx and ZC Rx. The only different action we need to
take is how we release the buffer.

Could you try pulling whole i40e_rx_is_programming_status() branch onto
function within i40e_txrx_common.h and see how much of a work would it
take to have this as a common function?

> +			if (eop)
>  				cleaned_count++;
> -			}
> +			i40e_reuse_rx_page(rx_ring, rx_buffer);
>  			continue;
>  		}
>  
> @@ -2507,7 +2518,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget,
>  		rx_buffer = i40e_get_rx_buffer(rx_ring, size);
>  
>  		neop = i40e_is_non_eop(rx_ring, rx_desc);
> -		i40e_inc_ntp(rx_ring);
> +		i40e_inc_ntp_ntc(rx_ring, &rx_ring->next_to_process, NULL);
>  
>  		if (!xdp->data) {
>  			unsigned char *hard_start;
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
> index e26807fd2123..3d7e4b3404f0 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
> @@ -17,6 +17,8 @@ void i40e_update_rx_stats(struct i40e_ring *rx_ring,
>  			  unsigned int total_rx_packets);
>  void i40e_finalize_xdp_rx(struct i40e_ring *rx_ring, unsigned int xdp_res);
>  void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val);
> +void i40e_inc_ntp_ntc(struct i40e_ring *rx_ring, u16 *next_to_process,
> +		      u16 *next_to_clean);
>  
>  #define I40E_XDP_PASS		0
>  #define I40E_XDP_CONSUMED	BIT(0)
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> index 9f47388eaba5..fdf72446ed67 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> @@ -410,7 +410,6 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
>  	u16 next_to_clean = rx_ring->next_to_clean;
>  	unsigned int xdp_res, xdp_xmit = 0;
>  	struct xdp_buff *first = NULL;
> -	u32 count = rx_ring->count;
>  	struct bpf_prog *xdp_prog;
>  	u32 entries_to_alloc;
>  	bool failure = false;
> @@ -430,6 +429,7 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
>  		struct xdp_buff *bi;
>  		unsigned int size;
>  		u64 qword;
> +		bool neop;
>  
>  		rx_desc = I40E_RX_DESC(rx_ring, next_to_process);
>  		qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
> @@ -446,8 +446,10 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
>  						      qword);
>  			bi = *i40e_rx_bi(rx_ring, next_to_process);
>  			xsk_buff_free(bi);
> -			if (++next_to_process == count)
> -				next_to_process = 0;
> +			i40e_inc_ntp_ntc(rx_ring, &next_to_process,
> +					 next_to_process == next_to_clean ?
> +						 &next_to_clean :
> +						 NULL);
>  			continue;
>  		}
>  
> @@ -466,16 +468,17 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
>  			break;
>  		}
>  
> -		if (++next_to_process == count)
> -			next_to_process = 0;
> +		neop = i40e_is_non_eop(rx_ring, rx_desc);
> +		// advance next_to_process. on EOP, advance next_to_clean as well.
> +		i40e_inc_ntp_ntc(rx_ring, &next_to_process,
> +				 !neop ? &next_to_clean : NULL);
>  
> -		if (i40e_is_non_eop(rx_ring, rx_desc))
> +		if (neop)
>  			continue;
>  
>  		xdp_res = i40e_run_xdp_zc(rx_ring, first, xdp_prog);
>  		i40e_handle_xdp_result_zc(rx_ring, first, rx_desc, &rx_packets,
>  					  &rx_bytes, xdp_res, &failure);
> -		next_to_clean = next_to_process;
>  		if (failure)
>  			break;
>  		total_rx_packets += rx_packets;
> -- 
> 2.43.0
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ