linux-kernel - Re: [PATCH] net/mlx5e: Make DEFAULT_FRAG

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID:
 <MN6PR16MB545062E2EBB54C553CE059CFB70CA@MN6PR16MB5450.namprd16.prod.outlook.com>
Date: Mon,  8 Sep 2025 21:35:32 +0800
From: Mingrui Cui <mingruic@...look.com>
To: dtatulea@...dia.com
Cc: andrew+netdev@...n.ch,
	davem@...emloft.net,
	edumazet@...gle.com,
	kuba@...nel.org,
	leon@...nel.org,
	linux-kernel@...r.kernel.org,
	linux-rdma@...r.kernel.org,
	mbloch@...dia.com,
	mingruic@...look.com,
	netdev@...r.kernel.org,
	pabeni@...hat.com,
	saeedm@...dia.com,
	tariqt@...dia.com
Subject: Re: [PATCH] net/mlx5e: Make DEFAULT_FRAG_SIZE relative to page size

> On Tue, Sep 02, 2025 at 09:00:16PM +0800, Mingrui Cui wrote:
> > When page size is 4K, DEFAULT_FRAG_SIZE of 2048 ensures that with 3
> > fragments per WQE, odd-indexed WQEs always share the same page with
> > their subsequent WQE. However, this relationship does not hold for page
> > sizes larger than 8K. In this case, wqe_index_mask cannot guarantee that
> > newly allocated WQEs won't share the same page with old WQEs.
> > 
> > If the last WQE in a bulk processed by mlx5e_post_rx_wqes() shares a
> > page with its subsequent WQE, allocating a page for that WQE will
> > overwrite mlx5e_frag_page, preventing the original page from being
> > recycled. When the next WQE is processed, the newly allocated page will
> > be immediately recycled.
> > 
> > In the next round, if these two WQEs are handled in the same bulk,
> > page_pool_defrag_page() will be called again on the page, causing
> > pp_frag_count to become negative.
> > 
> > Fix this by making DEFAULT_FRAG_SIZE always equal to half of the page
> > size.
> >
> Was there an actual encountered issue or is this a code clarity fix?
> 
> For 64K page size, linear mode will be used so the constant will not be
> used for calculating the frag size.
> 
> Thanks,
> Dragos

Yes, this was an actual issue we encountered that caused a kernel crash.

We found it on a server with a DEC-Alpha like processor, which uses 8KB page
size and runs a custom-built kernel. When using a ConnectX-4 Lx MT27710
(MCX4121A-ACA_Ax) NIC with the MTU set to 7657 or higher, the kernel would crash
during heavy traffic (e.g., iperf test). Here's the kernel log:

WARNING: CPU: 9 PID: 0 at include/net/page_pool/helpers.h:130
mlx5e_page_release_fragmented.isra.0+0xdc/0xf0 [mlx5_core]
Modules linked in: ib_umad ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core ipv6
mlx5_core tls
CPU: 9 PID: 0 Comm: swapper/9 Tainted: G        W          6.6.0 #23
 walk_stackframe+0x0/0x190
 show_stack+0x70/0x94
 dump_stack_lvl+0x98/0xd8
 dump_stack+0x2c/0x48
 __warn+0x1c8/0x220
 warn_slowpath_fmt+0x20c/0x230
 mlx5e_page_release_fragmented.isra.0+0xdc/0xf0 [mlx5_core]
 mlx5e_free_rx_wqes+0xcc/0x120 [mlx5_core]
 mlx5e_post_rx_wqes+0x1f4/0x4e0 [mlx5_core]
 mlx5e_napi_poll+0x1c0/0x8d0 [mlx5_core]
 __napi_poll+0x58/0x2e0
 net_rx_action+0x1a8/0x340
 __do_softirq+0x2b8/0x480
 irq_exit+0xd4/0x120
 do_entInt+0x164/0x520
 entInt+0x114/0x120
 __idle_end+0x0/0x50
 default_idle_call+0x64/0x150
 do_idle+0x10c/0x240
 cpu_startup_entry+0x70/0x80
 smp_callin+0x354/0x410
 __smp_callin+0x3c/0x40

Although this was on a custom kernel and processor, I believe this issue is
generic to any system using an 8KB page size. Unfortunately, I don't have an
Alpha server running a mainline kernel to verify this directly, and most
mainstream architectures don't support 8KB page size.

I also tried to modify some conditions in the driver to force it to fall back
into non-linear mode on an ARMv8 server configured with a 16KB page size, and
was then able to trigger the same warning and crash. So I suspect this issue
would also occur on 16KB page size if the NIC can be configured with a larger
MTU.

Best regards,
Mingrui Cui