lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <42ba9a0c-846b-4a9a-a434-9d53c770f948@gmail.com>
Date: Mon, 10 Nov 2025 16:00:44 +0200
From: Tariq Toukan <ttoukan.linux@...il.com>
To: Mingrui Cui <mingruic@...look.com>, Dragos Tatulea <dtatulea@...dia.com>
Cc: Saeed Mahameed <saeedm@...dia.com>, Leon Romanovsky <leon@...nel.org>,
 Tariq Toukan <tariqt@...dia.com>, Mark Bloch <mbloch@...dia.com>,
 Andrew Lunn <andrew+netdev@...n.ch>, "David S. Miller"
 <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 netdev@...r.kernel.org, linux-rdma@...r.kernel.org,
 linux-kernel@...r.kernel.org, Jacob Keller <jacob.e.keller@...el.com>
Subject: Re: [PATCH net v3] net/mlx5e: Make DEFAULT_FRAG_SIZE relative to page
 size



On 05/11/2025 17:33, Mingrui Cui wrote:
> When page size is 4K, DEFAULT_FRAG_SIZE of 2048 ensures that with 3
> fragments per WQE, odd-indexed WQEs always share the same page with
> their subsequent WQE, while WQEs consisting of 4 fragments does not.
> However, this relationship does not hold for page sizes larger than 8K.
> In this case, wqe_index_mask cannot guarantee that newly allocated WQEs
> won't share the same page with old WQEs.
> 
> If the last WQE in a bulk processed by mlx5e_post_rx_wqes() shares a
> page with its subsequent WQE, allocating a page for that WQE will
> overwrite mlx5e_frag_page, preventing the original page from being
> recycled. When the next WQE is processed, the newly allocated page will
> be immediately recycled. In the next round, if these two WQEs are
> handled in the same bulk, page_pool_defrag_page() will be called again
> on the page, causing pp_frag_count to become negative[1].
> 
> Moreover, this can also lead to memory corruption, as the page may have
> already been returned to the page pool and re-allocated to another WQE.
> And since skb_shared_info is stored at the end of the first fragment,
> its frags->bv_page pointer can be overwritten, leading to an invalid
> memory access when processing the skb[2].
> 
> For example, on 8K page size systems (e.g. DEC Alpha) with a ConnectX-4
> Lx MT27710 (MCX4121A-ACA_Ax) NIC setting MTU to 7657 or higher, heavy
> network loads (e.g. iperf) will first trigger a series of WARNINGs[1]
> and eventually crash[2].
> 
> Fix this by making DEFAULT_FRAG_SIZE always equal to half of the page
> size.
> 
> [1]
> WARNING: CPU: 9 PID: 0 at include/net/page_pool/helpers.h:130
> mlx5e_page_release_fragmented.isra.0+0xdc/0xf0 [mlx5_core]
> CPU: 9 PID: 0 Comm: swapper/9 Tainted: G        W          6.6.0
>   walk_stackframe+0x0/0x190
>   show_stack+0x70/0x94
>   dump_stack_lvl+0x98/0xd8
>   dump_stack+0x2c/0x48
>   __warn+0x1c8/0x220
>   warn_slowpath_fmt+0x20c/0x230
>   mlx5e_page_release_fragmented.isra.0+0xdc/0xf0 [mlx5_core]
>   mlx5e_free_rx_wqes+0xcc/0x120 [mlx5_core]
>   mlx5e_post_rx_wqes+0x1f4/0x4e0 [mlx5_core]
>   mlx5e_napi_poll+0x1c0/0x8d0 [mlx5_core]
>   __napi_poll+0x58/0x2e0
>   net_rx_action+0x1a8/0x340
>   __do_softirq+0x2b8/0x480
>   [...]
> 
> [2]
> Unable to handle kernel paging request at virtual address 393837363534333a
> Oops [#1]
> CPU: 72 PID: 0 Comm: swapper/72 Tainted: G        W          6.6.0
> Trace:
>   walk_stackframe+0x0/0x190
>   show_stack+0x70/0x94
>   die+0x1d4/0x350
>   do_page_fault+0x630/0x690
>   entMM+0x120/0x130
>   napi_pp_put_page+0x30/0x160
>   skb_release_data+0x164/0x250
>   kfree_skb_list_reason+0xd0/0x2f0
>   skb_release_data+0x1f0/0x250
>   napi_consume_skb+0xa0/0x220
>   net_rx_action+0x158/0x340
>   __do_softirq+0x2b8/0x480
>   irq_exit+0xd4/0x120
>   do_entInt+0x164/0x520
>   entInt+0x114/0x120
>   [...]
> 
> Fixes: 069d11465a80 ("net/mlx5e: RX, Enhance legacy Receive Queue memory scheme")
> Signed-off-by: Mingrui Cui <mingruic@...look.com>
> Reviewed-by: Jacob Keller <jacob.e.keller@...el.com>
> ---
> Changes in v3:
>    - Add a warning for page sizes above 8K as suggested.
> 
> Changes in v2:
>    - Add Fixes tag and more details to commit message.
>    - Target 'net' branch.
>    - Remove the obsolete WARN_ON() and update related comments.
> Link to v2: https://lore.kernel.org/all/MN6PR16MB5450C5EC9A1B2E2E78E8B241B71AA@MN6PR16MB5450.namprd16.prod.outlook.com/
> 
>   .../net/ethernet/mellanox/mlx5/core/en/params.c  | 16 +++++++---------
>   1 file changed, 7 insertions(+), 9 deletions(-)
> 

Reviewed-by: Tariq Toukan <tariqt@...dia.com>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ