[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8f1cba95-bedb-4f96-958d-c4f28982bdf2@blackwall.org>
Date: Fri, 28 Feb 2025 14:49:09 +0200
From: Nikolay Aleksandrov <razor@...ckwall.org>
To: Ian Kumlien <ian.kumlien@...il.com>
Cc: netdev@...r.kernel.org, Ajit Khaparde <ajit.khaparde@...adcom.com>,
Sriharsha Basavapatna <sriharsha.basavapatna@...adcom.com>,
Somnath Kotur <somnath.kotur@...adcom.com>,
Andrew Lunn <andrew+netdev@...n.ch>, davem@...emloft.net,
edumazet@...gle.com, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>
Subject: Re: [PATCH net] be2net: fix sleeping while atomic bugs in
be_ndo_bridge_getlink
On 2/28/25 14:46, Ian Kumlien wrote:
> Actually, while you might already have realized this, I didn't quite
> understand how important this fix seems to be....
>
You mean the be2net would send broken packets to this other machine with mlx5 card?
Or did I misunderstand you?
> From another machine i found this:
> [lör feb 22 23:46:32 2025] mlx5_core 0000:02:00.1 enp2s0f1np1: hw csum failure
> [lör feb 22 23:46:32 2025] skb len=2488 headroom=78 headlen=1480 tailroom=0
> mac=(64,14) mac_len=14 net=(78,20) trans=98
> shinfo(txflags=0 nr_frags=0 gso(size=1452
> type=393216 segs=2))
> csum(0x2baef95d start=63837 offset=11182
> ip_summed=2 complete_sw=0 valid=0 level=0)
> hash(0xb9a84019 sw=0 l4=1) proto=0x0800
> pkttype=0 iif=8
> priority=0x0 mark=0x0 alloc_cpu=1 vlan_all=0x0
> encapsulation=0 inner(proto=0x0000, mac=0,
> net=0, trans=0)
> [lör feb 22 23:46:32 2025] dev name=enp2s0f1np1 feat=0x0e12a1c21cd14ba9
>
> And:
> [lör feb 22 23:46:33 2025] skb fraglist:
> [lör feb 22 23:46:33 2025] skb len=1008 headroom=106 headlen=1008 tailroom=38
> mac=(64,14) mac_len=14 net=(78,20) trans=98
> shinfo(txflags=0 nr_frags=0 gso(size=0
> type=0 segs=0))
> csum(0x86f9 start=34553 offset=0
> ip_summed=2 complete_sw=0 valid=0 level=0)
> hash(0xb9a84019 sw=0 l4=1) proto=0x0800
> pkttype=0 iif=0
> priority=0x0 mark=0x0 alloc_cpu=1 vlan_all=0x0
> encapsulation=0 inner(proto=0x0000, mac=0,
> net=0, trans=0)
> [lör feb 22 23:46:33 2025] dev name=enp2s0f1np1 feat=0x0e12a1c21cd14ba9
>
> Including:
> [lör feb 22 23:46:34 2025] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not
> tainted 6.13.4 #449
> [lör feb 22 23:46:34 2025] Hardware name: Supermicro Super
> Server/A2SDi-12C-HLN4F, BIOS 1.9a 12/25/2023
> [lör feb 22 23:46:34 2025] Call Trace:
> [lör feb 22 23:46:34 2025] <IRQ>
> [lör feb 22 23:46:34 2025] dump_stack_lvl+0x47/0x70
> [lör feb 22 23:46:34 2025] __skb_checksum_complete+0xda/0xf0
> [lör feb 22 23:46:34 2025] ? __pfx_csum_partial_ext+0x10/0x10
> [lör feb 22 23:46:34 2025] ? __pfx_csum_block_add_ext+0x10/0x10
> [lör feb 22 23:46:34 2025] nf_conntrack_udp_packet+0x171/0x260
> [lör feb 22 23:46:34 2025] nf_conntrack_in+0x391/0x590
> [lör feb 22 23:46:34 2025] nf_hook_slow+0x3c/0xf0
> [lör feb 22 23:46:34 2025] nf_hook_slow_list+0x70/0xf0
> [lör feb 22 23:46:34 2025] ip_sublist_rcv+0x1ee/0x200
> [lör feb 22 23:46:34 2025] ? __pfx_ip_rcv_finish+0x10/0x10
> [lör feb 22 23:46:34 2025] ip_list_rcv+0xf8/0x130
> [lör feb 22 23:46:34 2025] __netif_receive_skb_list_core+0x24c/0x270
> [lör feb 22 23:46:34 2025] netif_receive_skb_list_internal+0x18f/0x2b0
> [lör feb 22 23:46:34 2025] ? mlx5e_handle_rx_cqe_mpwrq+0x116/0x210
> [lör feb 22 23:46:34 2025] napi_complete_done+0x65/0x260
> [lör feb 22 23:46:34 2025] mlx5e_napi_poll+0x172/0x760
> [lör feb 22 23:46:34 2025] __napi_poll+0x26/0x160
> [lör feb 22 23:46:34 2025] net_rx_action+0x173/0x300
> [lör feb 22 23:46:34 2025] ? notifier_call_chain+0x54/0xc0
> [lör feb 22 23:46:34 2025] ? atomic_notifier_call_chain+0x30/0x40
> [lör feb 22 23:46:34 2025] handle_softirqs+0xcd/0x270
> [lör feb 22 23:46:34 2025] irq_exit_rcu+0x85/0xa0
> [lör feb 22 23:46:34 2025] common_interrupt+0x81/0xa0
> [lör feb 22 23:46:34 2025] </IRQ>
> [lör feb 22 23:46:34 2025] <TASK>
> [lör feb 22 23:46:34 2025] asm_common_interrupt+0x22/0x40
> [lör feb 22 23:46:34 2025] RIP: 0010:cpuidle_enter_state+0xbc/0x430
> [lör feb 22 23:46:34 2025] Code: 77 02 00 00 e8 65 31 ec fe e8 60 f8
> ff ff 49 89 c5 0f 1f 44 00 00 31 ff e8 a1 68 eb fe 45 84 ff 0f 85 49
> 02 00 00 fb 45 85 f6 <0f> 88 8d 01 00 00 49 63 ce 4c 8b 14 24 48 8d 04
> 49 48 8d 14 81 48
> [lör feb 22 23:46:34 2025] RSP: 0018:ffffb504000b7e88 EFLAGS: 00000202
> [lör feb 22 23:46:34 2025] RAX: ffff9c0a2fa40000 RBX: ffff9c0a2fa76e60
> RCX: 0000000000000000
> [lör feb 22 23:46:34 2025] RDX: 0000252e1dcfee30 RSI: fffffff3c1a65ecc
> RDI: 0000000000000000
> [lör feb 22 23:46:34 2025] RBP: 0000000000000002 R08: 0000000000000000
> R09: 00000000000001f6
> [lör feb 22 23:46:34 2025] R10: 0000000000000018 R11: ffff9c0a2fa6c3ac
> R12: ffffffffaac2de60
> [lör feb 22 23:46:34 2025] R13: 0000252e1dcfee30 R14: 0000000000000002
> R15: 0000000000000000
> [lör feb 22 23:46:34 2025] ? cpuidle_enter_state+0xaf/0x430
> [lör feb 22 23:46:34 2025] cpuidle_enter+0x24/0x40
> [lör feb 22 23:46:34 2025] do_idle+0x16e/0x1b0
> [lör feb 22 23:46:34 2025] cpu_startup_entry+0x20/0x30
> [lör feb 22 23:46:34 2025] start_secondary+0xf3/0x100
> [lör feb 22 23:46:34 2025] common_startup_64+0x13e/0x148
> [lör feb 22 23:46:34 2025] </TASK>
> ---
>
> Asking gemini for help identified the machine in the basement as the
> culprit - so it seems like it could send corrupt data - i haven't had
> a closer look though
>
Interesting. :)
> On Thu, Feb 27, 2025 at 5:41 PM Nikolay Aleksandrov <razor@...ckwall.org> wrote:
>>
Powered by blists - more mailing lists