netdev - Re: [BUG] mlx5_core memory management issue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <jlvrzm6q7dnai6nf5v3ifhtwqlnvvrdg5driqomnl5q4lzfxmk@tmwaadjob5yd>
Date: Thu, 24 Jul 2025 17:01:16 +0000
From: Dragos Tatulea <dtatulea@...dia.com>
To: Chris Arges <carges@...udflare.com>
Cc: netdev@...r.kernel.org, bpf@...r.kernel.org, 
	kernel-team <kernel-team@...udflare.com>, Jesper Dangaard Brouer <hawk@...nel.org>, tariqt@...dia.com, 
	saeedm@...dia.com, Leon Romanovsky <leon@...nel.org>, 
	Andrew Lunn <andrew+netdev@...n.ch>, "David S. Miller" <davem@...emloft.net>, 
	Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, 
	Paolo Abeni <pabeni@...hat.com>, Alexei Starovoitov <ast@...nel.org>, 
	Daniel Borkmann <daniel@...earbox.net>, John Fastabend <john.fastabend@...il.com>, 
	Simon Horman <horms@...nel.org>, Andrew Rzeznik <arzeznik@...udflare.com>, 
	Yan Zhai <yan@...udflare.com>
Subject: Re: [BUG] mlx5_core memory management issue

On Wed, Jul 23, 2025 at 01:48:07PM -0500, Chris Arges wrote:
> 
> Ok, we can reproduce this problem!
> 
> I tried to simplify this reproducer, but it seems like what's needed is:
> - xdp program attached to mlx5 NIC
> - cpumap redirect
> - device redirect (map or just bpf_redirect)
> - frame gets turned into an skb
> Then from another machine send many flows of UDP traffic to trigger the problem.
> 
> I've put together a program that reproduces the issue here:
> - https://github.com/arges/xdp-redirector
>
Much appreciated! I fumbled around initially, not managing to get
traffic to the xdp_devmap stage. But further debugging revealed that GRO
needs to be enabled on the veth devices for XDP redir to work to the
xdp_devmap. After that I managed to reproduce your issue.

Now I can start looking into it.

> In general the failure manifests with many different WARNs such as:
> include/net/page_pool/helpers.h:277 mlx5e_page_release_fragmented.isra.0+0xf7/0x150 [mlx5_core]
> Then the machine crashes.
> 
> I was able to get a crashdump which shows:
> ```
> PID: 0        TASK: ffff8c0910134380  CPU: 76   COMMAND: "swapper/76"
>  #0 [fffffe10906d3ea8] crash_nmi_callback at ffffffffadc5c4fd
>  #1 [fffffe10906d3eb0] default_do_nmi at ffffffffae9524f0
>  #2 [fffffe10906d3ed0] exc_nmi at ffffffffae952733
>  #3 [fffffe10906d3ef0] end_repeat_nmi at ffffffffaea01bfd
>     [exception RIP: io_serial_in+25]
>     RIP: ffffffffae4cd489  RSP: ffffb3c60d6049e8  RFLAGS: 00000002
>     RAX: ffffffffae4cd400  RBX: 00000000000025d8  RCX: 0000000000000000
>     RDX: 00000000000002fd  RSI: 0000000000000005  RDI: ffffffffb10a9cb0
>     RBP: 0000000000000000   R8: 2d2d2d2d2d2d2d2d   R9: 656820747563205b
>     R10: 000000002d2d2d2d  R11: 000000002d2d2d2d  R12: ffffffffb0fa5610
>     R13: 0000000000000000  R14: 0000000000000000  R15: ffffffffb10a9cb0
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> --- <NMI exception stack> ---
>  #4 [ffffb3c60d6049e8] io_serial_in at ffffffffae4cd489
>  #5 [ffffb3c60d6049e8] serial8250_console_write at ffffffffae4d2fcf
>  #6 [ffffb3c60d604a80] console_flush_all at ffffffffadd1cf26
>  #7 [ffffb3c60d604b00] console_unlock at ffffffffadd1d1df
>  #8 [ffffb3c60d604b48] vprintk_emit at ffffffffadd1dda1
>  #9 [ffffb3c60d604b98] _printk at ffffffffae90250c
> #10 [ffffb3c60d604bf8] report_bug.cold at ffffffffae95001d
> #11 [ffffb3c60d604c38] handle_bug at ffffffffae950e91
> #12 [ffffb3c60d604c58] exc_invalid_op at ffffffffae9512b7
> #13 [ffffb3c60d604c70] asm_exc_invalid_op at ffffffffaea0123a
>     [exception RIP: mlx5e_page_release_fragmented+85]
>     RIP: ffffffffc25f75c5  RSP: ffffb3c60d604d20  RFLAGS: 00010293
>     RAX: 000000000000003f  RBX: ffff8bfa8f059fd0  RCX: ffffe3bf1992a180
>     RDX: 000000000000003d  RSI: ffffe3bf1992a180  RDI: ffff8bf9b0784000
>     RBP: 0000000000000040   R8: 00000000000001d2   R9: 0000000000000006
>     R10: ffff8c06de22f380  R11: ffff8bfcfe6cd680  R12: 00000000000001d2
>     R13: 000000000000002b  R14: ffff8bf9b0784000  R15: 0000000000000000
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> #14 [ffffb3c60d604d20] mlx5e_free_rx_wqes at ffffffffc25f7e2f [mlx5_core]
> #15 [ffffb3c60d604d58] mlx5e_post_rx_wqes at ffffffffc25f877c [mlx5_core]
> #16 [ffffb3c60d604dc0] mlx5e_napi_poll at ffffffffc25fdd27 [mlx5_core]
> #17 [ffffb3c60d604e20] __napi_poll at ffffffffae6a8ddb
> #18 [ffffb3c60d604e90] __napi_poll at ffffffffae6a8db5
> #19 [ffffb3c60d604e98] net_rx_action at ffffffffae6a95f1
> #20 [ffffb3c60d604f98] handle_softirqs at ffffffffadc9d4bf
> #21 [ffffb3c60d604fe8] irq_exit_rcu at ffffffffadc9e057
> #22 [ffffb3c60d604ff0] common_interrupt at ffffffffae952015
> --- <IRQ stack> ---
> #23 [ffffb3c60c837de8] asm_common_interrupt at ffffffffaea01466
>     [exception RIP: cpuidle_enter_state+184]
>     RIP: ffffffffae955c38  RSP: ffffb3c60c837e98  RFLAGS: 00000202
>     RAX: ffff8c0cffc00000  RBX: ffff8c0911002400  RCX: 0000000000000000
>     RDX: 00003c630b2d073a  RSI: ffffffe519600d10  RDI: 0000000000000000
>     RBP: 0000000000000001   R8: 0000000000000002   R9: 0000000000000001
>     R10: ffff8c0cffc330c4  R11: 071c71c71c71c71c  R12: ffffffffb05ff820
>     R13: 00003c630b2d073a  R14: 0000000000000001  R15: 0000000000000000
>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> #24 [ffffb3c60c837ed0] cpuidle_enter at ffffffffae64b4ad
> #25 [ffffb3c60c837ef0] do_idle at ffffffffadcfa7c6
> #26 [ffffb3c60c837f30] cpu_startup_entry at ffffffffadcfaa09
> #27 [ffffb3c60c837f40] start_secondary at ffffffffadc5ec77
> #28 [ffffb3c60c837f50] common_startup_64 at ffffffffadc24d5d
> ```
> 
> Assuming (this is x86_64):
> RDI=ffff8bf9b0784000 (rq)
> RSI=ffffe3bf1992a180 (frag_page)
> 
> ```
> static void mlx5e_page_release_fragmented(struct mlx5e_rq *rq,
>                                           struct mlx5e_frag_page *frag_page)
> {
>         u16 drain_count = MLX5E_PAGECNT_BIAS_MAX - frag_page->frags;
>         struct page *page = frag_page->page;
> 
>         if (page_pool_unref_page(page, drain_count) == 0)
>                 page_pool_put_unrefed_page(rq->page_pool, page, -1, true);
> }
> ```
> 
> crash> struct mlx5e_frag_page ffffe3bf1992a180
> struct mlx5e_frag_page {
>   page = 0x26ffff800000000,
>   frags = 49856
> }
>
Most incorrect fragment counting issues have a tendency to show up here.

Thanks,
Dragos