lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <kv5syvra5hlvswecmzrbgne7ydmj6pf4dhzcoica3fdo6dina6@64w5pvo3lvbt>
Date: Mon, 8 Sep 2025 14:25:48 +0000
From: Dragos Tatulea <dtatulea@...dia.com>
To: Mingrui Cui <mingruic@...look.com>
Cc: andrew+netdev@...n.ch, davem@...emloft.net, edumazet@...gle.com, 
	kuba@...nel.org, leon@...nel.org, linux-kernel@...r.kernel.org, 
	linux-rdma@...r.kernel.org, mbloch@...dia.com, netdev@...r.kernel.org, pabeni@...hat.com, 
	saeedm@...dia.com, tariqt@...dia.com
Subject: Re: [PATCH] net/mlx5e: Make DEFAULT_FRAG_SIZE relative to page size

On Mon, Sep 08, 2025 at 09:35:32PM +0800, Mingrui Cui wrote:
> > On Tue, Sep 02, 2025 at 09:00:16PM +0800, Mingrui Cui wrote:
> > > When page size is 4K, DEFAULT_FRAG_SIZE of 2048 ensures that with 3
> > > fragments per WQE, odd-indexed WQEs always share the same page with
> > > their subsequent WQE. However, this relationship does not hold for page
> > > sizes larger than 8K. In this case, wqe_index_mask cannot guarantee that
> > > newly allocated WQEs won't share the same page with old WQEs.
> > > 
> > > If the last WQE in a bulk processed by mlx5e_post_rx_wqes() shares a
> > > page with its subsequent WQE, allocating a page for that WQE will
> > > overwrite mlx5e_frag_page, preventing the original page from being
> > > recycled. When the next WQE is processed, the newly allocated page will
> > > be immediately recycled.
> > > 
> > > In the next round, if these two WQEs are handled in the same bulk,
> > > page_pool_defrag_page() will be called again on the page, causing
> > > pp_frag_count to become negative.
> > > 
> > > Fix this by making DEFAULT_FRAG_SIZE always equal to half of the page
> > > size.
> > >
> > Was there an actual encountered issue or is this a code clarity fix?
> > 
> > For 64K page size, linear mode will be used so the constant will not be
> > used for calculating the frag size.
> > 
> > Thanks,
> > Dragos
> 
> Yes, this was an actual issue we encountered that caused a kernel crash.
> 
> We found it on a server with a DEC-Alpha like processor, which uses 8KB page
> size and runs a custom-built kernel. When using a ConnectX-4 Lx MT27710
> (MCX4121A-ACA_Ax) NIC with the MTU set to 7657 or higher, the kernel would crash
> during heavy traffic (e.g., iperf test). Here's the kernel log:
> 
> WARNING: CPU: 9 PID: 0 at include/net/page_pool/helpers.h:130
> mlx5e_page_release_fragmented.isra.0+0xdc/0xf0 [mlx5_core]
> Modules linked in: ib_umad ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core ipv6
> mlx5_core tls
> CPU: 9 PID: 0 Comm: swapper/9 Tainted: G        W          6.6.0 #23
>  walk_stackframe+0x0/0x190
>  show_stack+0x70/0x94
>  dump_stack_lvl+0x98/0xd8
>  dump_stack+0x2c/0x48
>  __warn+0x1c8/0x220
>  warn_slowpath_fmt+0x20c/0x230
>  mlx5e_page_release_fragmented.isra.0+0xdc/0xf0 [mlx5_core]
>  mlx5e_free_rx_wqes+0xcc/0x120 [mlx5_core]
>  mlx5e_post_rx_wqes+0x1f4/0x4e0 [mlx5_core]
>  mlx5e_napi_poll+0x1c0/0x8d0 [mlx5_core]
>  __napi_poll+0x58/0x2e0
>  net_rx_action+0x1a8/0x340
>  __do_softirq+0x2b8/0x480
>  irq_exit+0xd4/0x120
>  do_entInt+0x164/0x520
>  entInt+0x114/0x120
>  __idle_end+0x0/0x50
>  default_idle_call+0x64/0x150
>  do_idle+0x10c/0x240
>  cpu_startup_entry+0x70/0x80
>  smp_callin+0x354/0x410
>  __smp_callin+0x3c/0x40
> 
> Although this was on a custom kernel and processor, I believe this issue is
> generic to any system using an 8KB page size. Unfortunately, I don't have an
> Alpha server running a mainline kernel to verify this directly, and most
> mainstream architectures don't support 8KB page size.
>
Oh, I see. Thanks for the note. I had issues finding any arch that
supports 8K page size.

The information above would be useful in the commit message as well.

Also, you need a fixes tag for net patches. Probably this one:
Fixes: 069d11465a80 ("net/mlx5e: RX, Enhance legacy Receive Queue memory scheme")


Thanks,
Dragos


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ