linux-kernel - Re: [PATCH] net/mlx5e: Make DEFAULT_FRAG

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <gns4qcq7gz24conarxktc5hl3hzgwiltqqotg2675ra2uz7awv@rszzlmr7kztr>
Date: Mon, 15 Sep 2025 13:28:11 +0000
From: Dragos Tatulea <dtatulea@...dia.com>
To: Mingrui Cui <mingruic@...look.com>
Cc: andrew+netdev@...n.ch, davem@...emloft.net, edumazet@...gle.com, 
	kuba@...nel.org, leon@...nel.org, linux-kernel@...r.kernel.org, 
	linux-rdma@...r.kernel.org, mbloch@...dia.com, netdev@...r.kernel.org, pabeni@...hat.com, 
	saeedm@...dia.com, tariqt@...dia.com
Subject: Re: [PATCH] net/mlx5e: Make DEFAULT_FRAG_SIZE relative to page size

On Mon, Sep 08, 2025 at 02:25:48PM +0000, Dragos Tatulea wrote:
> On Mon, Sep 08, 2025 at 09:35:32PM +0800, Mingrui Cui wrote:
> > > On Tue, Sep 02, 2025 at 09:00:16PM +0800, Mingrui Cui wrote:
> > > > When page size is 4K, DEFAULT_FRAG_SIZE of 2048 ensures that with 3
> > > > fragments per WQE, odd-indexed WQEs always share the same page with
> > > > their subsequent WQE. However, this relationship does not hold for page
> > > > sizes larger than 8K. In this case, wqe_index_mask cannot guarantee that
> > > > newly allocated WQEs won't share the same page with old WQEs.
> > > > 
> > > > If the last WQE in a bulk processed by mlx5e_post_rx_wqes() shares a
> > > > page with its subsequent WQE, allocating a page for that WQE will
> > > > overwrite mlx5e_frag_page, preventing the original page from being
> > > > recycled. When the next WQE is processed, the newly allocated page will
> > > > be immediately recycled.
> > > > 
> > > > In the next round, if these two WQEs are handled in the same bulk,
> > > > page_pool_defrag_page() will be called again on the page, causing
> > > > pp_frag_count to become negative.
> > > > 
> > > > Fix this by making DEFAULT_FRAG_SIZE always equal to half of the page
> > > > size.
> > > >
> > > Was there an actual encountered issue or is this a code clarity fix?
> > > 
> > > For 64K page size, linear mode will be used so the constant will not be
> > > used for calculating the frag size.
> > > 
> > > Thanks,
> > > Dragos
> > 
> > Yes, this was an actual issue we encountered that caused a kernel crash.
> > 
> > We found it on a server with a DEC-Alpha like processor, which uses 8KB page
> > size and runs a custom-built kernel. When using a ConnectX-4 Lx MT27710
> > (MCX4121A-ACA_Ax) NIC with the MTU set to 7657 or higher, the kernel would crash
> > during heavy traffic (e.g., iperf test). Here's the kernel log:
> > 
Tariq and I had a closer look at mlx5e_build_rq_frags_info() and noticed
that for the given MTU (7657) you should have seen the WARN_ON() from
[1]. Unless you are running XDP or a higher MTU in which case
frag_size_max was reset to PAGE_SIZE [2]. Did you observe this warning?

[1] https://elixir.bootlin.com/linux/v6.17-rc5/source/drivers/net/ethernet/mellanox/mlx5/core/en/params.c#L762
[2] https://elixir.bootlin.com/linux/v6.17-rc5/source/drivers/net/ethernet/mellanox/mlx5/core/en/params.c#L710

Thanks,
Dragos