lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
 <MN6PR16MB5450D5D96A644541A10B548CB714A@MN6PR16MB5450.namprd16.prod.outlook.com>
Date: Tue, 16 Sep 2025 17:01:33 +0800
From: Mingrui Cui <mingruic@...look.com>
To: dtatulea@...dia.com
Cc: andrew+netdev@...n.ch,
	davem@...emloft.net,
	edumazet@...gle.com,
	kuba@...nel.org,
	leon@...nel.org,
	linux-kernel@...r.kernel.org,
	linux-rdma@...r.kernel.org,
	mbloch@...dia.com,
	mingruic@...look.com,
	netdev@...r.kernel.org,
	pabeni@...hat.com,
	saeedm@...dia.com,
	tariqt@...dia.com
Subject: Re: [PATCH] net/mlx5e: Make DEFAULT_FRAG_SIZE relative to page size

On Mon, Sep 15, 2025 at 01:28:11PM +0000, Dragos Tatulea wrote:
> On Mon, Sep 08, 2025 at 02:25:48PM +0000, Dragos Tatulea wrote:
> > On Mon, Sep 08, 2025 at 09:35:32PM +0800, Mingrui Cui wrote:
> > > > On Tue, Sep 02, 2025 at 09:00:16PM +0800, Mingrui Cui wrote:
> > > > > When page size is 4K, DEFAULT_FRAG_SIZE of 2048 ensures that with 3
> > > > > fragments per WQE, odd-indexed WQEs always share the same page with
> > > > > their subsequent WQE. However, this relationship does not hold for page
> > > > > sizes larger than 8K. In this case, wqe_index_mask cannot guarantee that
> > > > > newly allocated WQEs won't share the same page with old WQEs.
> > > > > 
> > > > > If the last WQE in a bulk processed by mlx5e_post_rx_wqes() shares a
> > > > > page with its subsequent WQE, allocating a page for that WQE will
> > > > > overwrite mlx5e_frag_page, preventing the original page from being
> > > > > recycled. When the next WQE is processed, the newly allocated page will
> > > > > be immediately recycled.
> > > > > 
> > > > > In the next round, if these two WQEs are handled in the same bulk,
> > > > > page_pool_defrag_page() will be called again on the page, causing
> > > > > pp_frag_count to become negative.
> > > > > 
> > > > > Fix this by making DEFAULT_FRAG_SIZE always equal to half of the page
> > > > > size.
> > > > >
> > > > Was there an actual encountered issue or is this a code clarity fix?
> > > > 
> > > > For 64K page size, linear mode will be used so the constant will not be
> > > > used for calculating the frag size.
> > > > 
> > > > Thanks,
> > > > Dragos
> > > 
> > > Yes, this was an actual issue we encountered that caused a kernel crash.
> > > 
> > > We found it on a server with a DEC-Alpha like processor, which uses 8KB page
> > > size and runs a custom-built kernel. When using a ConnectX-4 Lx MT27710
> > > (MCX4121A-ACA_Ax) NIC with the MTU set to 7657 or higher, the kernel would crash
> > > during heavy traffic (e.g., iperf test). Here's the kernel log:
> > > 
> Tariq and I had a closer look at mlx5e_build_rq_frags_info() and noticed
> that for the given MTU (7657) you should have seen the WARN_ON() from
> [1]. Unless you are running XDP or a higher MTU in which case
> frag_size_max was reset to PAGE_SIZE [2]. Did you observe this warning?
> 
> [1] https://elixir.bootlin.com/linux/v6.17-rc5/source/drivers/net/ethernet/mellanox/mlx5/core/en/params.c#L762
> [2] https://elixir.bootlin.com/linux/v6.17-rc5/source/drivers/net/ethernet/mellanox/mlx5/core/en/params.c#L710

Yes, that WARN_ON() is triggered when setting MTU to 7657 above. Here is the
log:

WARNING: CPU: 129 PID: 4368 at drivers/net/ethernet/mellanox/mlx5/core/en/params.c:824 mlx5e_build_rq_param+0x25c/0x1050 [mlx5_core]
Modules linked in: ib_umad ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core uio_pdrv_genirq dm_mod sch_fq_codel mlx5_core tls efivarfs ipv6
CPU: 129 PID: 4368 Comm: ifconfig Not tainted 6.6.0 #23
Trace:
 walk_stackframe+0x0/0x190
 show_stack+0x70/0x94
 dump_stack_lvl+0x98/0xd8
 dump_stack+0x2c/0x48
 __warn+0x1c8/0x220
 warn_slowpath_fmt+0x20c/0x230
 mlx5e_build_rq_param+0x25c/0x1050 [mlx5_core]
 mlx5e_build_channel_param+0x60/0x6d0 [mlx5_core]
 mlx5e_open_channels+0xc8/0x1400 [mlx5_core]
 mlx5e_safe_switch_params+0xe0/0x1c0 [mlx5_core]
 mlx5e_change_mtu+0x13c/0x390 [mlx5_core]
 mlx5e_change_nic_mtu+0x38/0x60 [mlx5_core]
 dev_set_mtu_ext+0x12c/0x270
 dev_set_mtu+0x6c/0xf0
 dev_ifsioc+0x6d0/0x740
 dev_ioctl+0x54c/0x770
 sock_ioctl+0x368/0x4e0
 sys_ioctl+0x610/0xec0
 do_entSys+0xbc/0x1d0
 entSys+0x12c/0x130

I plan to remove this WARN_ON in v2 patch, as it becomes obsolete after changing
DEFAULT_FRAG_SIZE.

Thanks,
Mingrui Cui

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ