[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <8E92BAA8-0FC6-4D29-BB4D-B6B60047A1D2@gmail.com>
Date: Thu, 7 Dec 2023 00:26:38 +0200
From: Martin Zaharinov <micron10@...il.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: netdev <netdev@...r.kernel.org>,
Paolo Abeni <pabeni@...hat.com>,
patchwork-bot+netdevbpf@...nel.org,
Jakub Kicinski <kuba@...nel.org>,
Stephen Hemminger <stephen@...workplumber.org>,
kuba+netdrv@...nel.org,
dsahern@...il.com
Subject: Re: Urgent Bug Report Kernel crash 6.5.2
Hi all
its strange same problem is go on 6.6.4 same same debug log
diff hardware , users number and ….
in debug log is same : lib/rcuref.c
in this line is :
/*
* If the reference count was already in the dead zone, then this
* put() operation is imbalanced. Warn, put the reference count back to
* DEAD and tell the caller to not deconstruct the object.
*/
if (WARN_ONCE(cnt >= RCUREF_RELEASED, "rcuref - imbalanced put()")) {
atomic_set(&ref->refcnt, RCUREF_DEAD);
return false;
}
[529520.875413] CPU: 13 PID: 0 Comm: swapper/13 Tainted: G O 6.6.3 #1
[529520.875533] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.3 10/28/2020
[529520.875653] RIP: 0010:rcuref_put_slowpath+0x5f/0x70
[529520.875748] Code: 31 c0 eb e2 80 3d 9e d1 e6 00 00 74 0a c7 03 00 00 00 e0 31 c0 eb cf 48 c7 c7 d9 96 e3 8f c6 05 84 d1 e6 00 01 e8 41 9d c7 ff <0f> 0b eb df cc cc cc cc cc cc cc cc cc cc cc cc cc 48 89 fa 83 e2
[529520.875908] RSP: 0018:ffffa823c052cde8 EFLAGS: 00010296
[529520.876003] RAX: 0000000000000019 RBX: ffffa0f049053180 RCX: 00000000fff7ffff
[529520.876122] RDX: 00000000fff7ffff RSI: 0000000000000001 RDI: 00000000ffffffea
[529520.876244] RBP: ffffa0f0a8fffec0 R08: 0000000000000000 R09: 00000000fff7ffff
[529520.876364] R10: ffffa0f79ae00000 R11: 0000000000000003 R12: ffffa0f04655f000
[529520.876482] R13: 0000000000000258 R14: ffffa0f16ade1000 R15: ffffa0f79f964bd0
[529520.876601] FS: 0000000000000000(0000) GS:ffffa0f79f940000(0000) knlGS:0000000000000000
[529520.876723] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[529520.876822] CR2: 00007fa9bd56b3c8 CR3: 000000016e43e002 CR4: 00000000003706e0
[529520.877043] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[529520.877164] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[529520.877287] Call Trace:
[529520.877382] <IRQ>
[529520.877472] ? __warn+0x6c/0x130
[529520.877566] ? report_bug+0x1b8/0x200
[529520.877661] ? handle_bug+0x36/0x70
[529520.877753] ? exc_invalid_op+0x17/0x1a0
[529520.877849] ? asm_exc_invalid_op+0x16/0x20
[529520.877947] ? rcuref_put_slowpath+0x5f/0x70
[529520.878043] ? rcuref_put_slowpath+0x5f/0x70
[529520.878136] dst_release+0x1c/0x40
[529520.878229] __dev_queue_xmit+0x594/0xcd0
[529520.878324] ? eth_header+0x25/0xc0
[529520.878417] ip_finish_output2+0x1a0/0x530
[529520.878514] process_backlog+0x107/0x210
[529520.878610] __napi_poll+0x20/0x180
[529520.878702] net_rx_action+0x29f/0x380
[529520.878935] __do_softirq+0xd0/0x202
[529520.879033] do_softirq+0x3a/0x50
[529520.879127] </IRQ>
[529520.879217] <TASK>
[529520.879306] flush_smp_call_function_queue+0x3f/0x50
[529520.879407] do_idle+0x14d/0x210
[529520.879500] cpu_startup_entry+0x21/0x30
[529520.879597] start_secondary+0xe1/0xf0
[529520.879693] secondary_startup_64_no_verify+0x166/0x16b
[529520.879793] </TASK>
[529520.879884] ---[ end trace 0000000000000000 ]—
m.
> On 16 Nov 2023, at 16:17, Martin Zaharinov <micron10@...il.com> wrote:
>
> Hi All
>
> report same problem with kernel 6.6.1 - i think problem is in rcu but … if have options to add people from RCU here.
>
> See report :
>
>
>
> [141229.505339] ------------[ cut here ]------------
> [141229.505492] rcuref - imbalanced put()
> [141229.505504] WARNING: CPU: 8 PID: 0 at lib/rcuref.c:267 rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
> [141229.505821] Modules linked in: xsk_diag unix_diag iptable_filter xt_TCPMSS iptable_mangle xt_addrtype xt_nat xt_MASQUERADE iptable_nat ip_tables netconsole coretemp e1000 ixgbe mdio pppoe pppox sha1_ssse3 sha1_generic ppp_mppe libarc4 ppp_generic slhc nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
> [141229.506349] CPU: 8 PID: 0 Comm: swapper/8 Tainted: G O 6.6.1 #1
> [141229.506527] Hardware name: Persy Super Server/X11DDW-L, BIOS 4.0 07/11/2023
> [141229.506701] RIP: 0010:rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
> [141229.506843] Code: 31 c0 eb e2 80 3d ef 4e e6 00 00 74 0a c7 03 00 00 00 e0 31 c0 eb cf 48 c7 c7 07 99 e3 97 c6 05 d5 4e e6 00 01 e8 d1 1f c7 ff <0f> 0b eb df cc cc cc cc cc cc cc cc cc cc cc cc cc 48 89 fa 83 e2
> All code
> ========
> 0: 31 c0 xor %eax,%eax
> 2: eb e2 jmp 0xffffffffffffffe6
> 4: 80 3d ef 4e e6 00 00 cmpb $0x0,0xe64eef(%rip) # 0xe64efa
> b: 74 0a je 0x17
> d: c7 03 00 00 00 e0 movl $0xe0000000,(%rbx)
> 13: 31 c0 xor %eax,%eax
> 15: eb cf jmp 0xffffffffffffffe6
> 17: 48 c7 c7 07 99 e3 97 mov $0xffffffff97e39907,%rdi
> 1e: c6 05 d5 4e e6 00 01 movb $0x1,0xe64ed5(%rip) # 0xe64efa
> 25: e8 d1 1f c7 ff call 0xffffffffffc71ffb
> 2a:* 0f 0b ud2 <-- trapping instruction
> 2c: eb df jmp 0xd
> 2e: cc int3
> 2f: cc int3
> 30: cc int3
> 31: cc int3
> 32: cc int3
> 33: cc int3
> 34: cc int3
> 35: cc int3
> 36: cc int3
> 37: cc int3
> 38: cc int3
> 39: cc int3
> 3a: cc int3
> 3b: 48 89 fa mov %rdi,%rdx
> 3e: 83 .byte 0x83
> 3f: e2 .byte 0xe2
>
> Code starting with the faulting instruction
> ===========================================
> 0: 0f 0b ud2
> 2: eb df jmp 0xffffffffffffffe3
> 4: cc int3
> 5: cc int3
> 6: cc int3
> 7: cc int3
> 8: cc int3
> 9: cc int3
> a: cc int3
> b: cc int3
> c: cc int3
> d: cc int3
> e: cc int3
> f: cc int3
> 10: cc int3
> 11: 48 89 fa mov %rdi,%rdx
> 14: 83 .byte 0x83
> 15: e2 .byte 0xe2
> [141229.507086] RSP: 0018:ffffa444449e0978 EFLAGS: 00010296
> [141229.507229] RAX: 0000000000000019 RBX: ffff9b54866a4100 RCX: 00000000fff7ffff
> [141229.507404] RDX: 00000000fff7ffff RSI: 0000000000000001 RDI: 00000000ffffffea
> [141229.507577] RBP: ffff9b53e57b1ec0 R08: 0000000000000000 R09: 00000000fff7ffff
> [141229.507751] R10: ffff9b62db200000 R11: 0000000000000003 R12: ffff9b5b0595c000
> [141229.507929] R13: ffff9b5b09c32200 R14: ffff9b5b09e29a00 R15: ffff9b5b0557e080
> [141229.508101] FS: 0000000000000000(0000) GS:ffff9b62dfa00000(0000) knlGS:0000000000000000
> [141229.508279] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [141229.508425] CR2: 00007fbadced6a80 CR3: 000000096f014002 CR4: 00000000003706e0
> [141229.508599] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [141229.508773] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [141229.508947] Call Trace:
> [141229.509079] <IRQ>
> [141229.509206] ? __warn (kernel/panic.c:235 kernel/panic.c:673)
> [141229.509342] ? report_bug (lib/bug.c:180 lib/bug.c:219)
> [141229.509482] ? handle_bug (arch/x86/kernel/traps.c:237)
> [141229.509617] ? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1))
> [141229.509751] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:568)
> [141229.509892] ? rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
> [141229.510028] ? rcuref_put_slowpath (lib/rcuref.c:267 (discriminator 1))
> [141229.510164] dst_release (./arch/x86/include/asm/preempt.h:95 ./include/linux/rcuref.h:151 net/core/dst.c:166)
> [141229.510302] __dev_queue_xmit (./include/net/dst.h:283 net/core/dev.c:4324)
> [141229.510441] vlan_dev_hard_start_xmit (net/8021q/vlan_dev.c:130)
> [141229.510584] dev_hard_start_xmit (./include/linux/netdevice.h:4904 net/core/dev.c:3573 net/core/dev.c:3589)
> [141229.510722] __dev_queue_xmit (./include/linux/netdevice.h:3278 (discriminator 25) net/core/dev.c:4370 (discriminator 25))
> [141229.510862] ? eth_header (net/ethernet/eth.c:85)
> [141229.510998] ip_finish_output2 (./include/net/neighbour.h:542 (discriminator 2) net/ipv4/ip_output.c:233 (discriminator 2))
> [141229.511135] ip_sabotage_in (net/bridge/br_netfilter_hooks.c:881 net/bridge/br_netfilter_hooks.c:866)
> [141229.511269] nf_hook_slow (./include/linux/netfilter.h:144 net/netfilter/core.c:626)
> [141229.511406] ip_rcv (./include/linux/netfilter.h:259 ./include/linux/netfilter.h:302 net/ipv4/ip_input.c:569)
> [141229.511540] ? ip_rcv_core.constprop.0 (net/ipv4/ip_input.c:436)
> [141229.511678] netif_receive_skb (net/core/dev.c:5552 net/core/dev.c:5666 net/core/dev.c:5752 net/core/dev.c:5811)
> [141229.511814] br_handle_frame_finish (net/bridge/br_input.c:216)
> [141229.511954] ? br_pass_frame_up (net/bridge/br_input.c:75)
> [141229.512092] br_nf_hook_thresh (net/bridge/br_netfilter_hooks.c:1051)
> [141229.512227] ? br_pass_frame_up (net/bridge/br_input.c:75)
> [141229.512363] br_nf_pre_routing_finish (net/bridge/br_netfilter_hooks.c:427)
> [141229.512501] ? br_pass_frame_up (net/bridge/br_input.c:75)
> [141229.512644] ? nf_nat_ipv4_pre_routing (net/netfilter/nf_nat_proto.c:656) nf_nat
> [141229.512792] br_nf_pre_routing (net/bridge/br_netfilter_hooks.c:538)
> [141229.512928] ? br_nf_hook_thresh (net/bridge/br_netfilter_hooks.c:354)
> [141229.513061] br_handle_frame (./include/linux/netfilter.h:144 net/bridge/br_input.c:272 net/bridge/br_input.c:417)
> [141229.513196] ? br_pass_frame_up (net/bridge/br_input.c:75)
> [141229.513333] __netif_receive_skb_core.constprop.0 (net/core/dev.c:5446 (discriminator 1))
> [141229.513475] ? ip_finish_output2 (net/ipv4/ip_output.c:243)
> [141229.513613] process_backlog (net/core/dev.c:5551 net/core/dev.c:5666 net/core/dev.c:5994)
> [141229.513749] __napi_poll (net/core/dev.c:6556)
> [141229.513887] net_rx_action (net/core/dev.c:6625 net/core/dev.c:6756)
> [141229.514023] __do_softirq (./arch/x86/include/asm/preempt.h:27 kernel/softirq.c:564)
> [141229.514158] do_softirq (kernel/softirq.c:463 (discriminator 32) kernel/softirq.c:450 (discriminator 32))
> [141229.514292] </IRQ>
> [141229.514420] <TASK>
> [141229.514548] flush_smp_call_function_queue (./arch/x86/include/asm/irqflags.h:134 (discriminator 1) kernel/smp.c:579 (discriminator 1))
> [141229.514688] do_idle (kernel/sched/idle.c:314)
> [141229.514822] cpu_startup_entry (kernel/sched/idle.c:379)
> [141229.516148] start_secondary (arch/x86/kernel/smpboot.c:326)
> [141229.516291] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:433)
> [141229.516435] </TASK>
> [141229.516562] ---[ end trace 0000000000000000 ]—
>
>
> Best regards,
> Martin
>
>
>
>> On 15 Sep 2023, at 9:45, Eric Dumazet <edumazet@...gle.com> wrote:
>>
>> scripts/decode_stacktrace.sh
>
>
Powered by blists - more mailing lists