lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 16 Jan 2024 20:17:22 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Matthieu Baerts <matttbe@...nel.org>
Cc: Netdev <netdev@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: Kernel panic in netif_rx_internal after v6 pings between netns

On Tue, Jan 16, 2024 at 7:36 PM Matthieu Baerts <matttbe@...nel.org> wrote:
>
> Hello,
>
> Our MPTCP CIs recently hit some kernel panics when validating the -net
> tree + 2 pending MPTCP patches. This is on top of e327b2372bc0 ("net:
> ravb: Fix dma_addr_t truncation in error case").
>
> It looks like these panics are not related to MPTCP. That's why I'm
> sharing that here:

Indeed, this seems an x86 issue to me (jump labels ?), are all stack
traces pointing to the same issue ?

Let's cc lkml just in case this rings a bell

>
> > # INFO: validating network environment with pings
> > [   45.505495] int3: 0000 [#1] PREEMPT SMP NOPTI
> > [   45.505547] CPU: 1 PID: 1070 Comm: ping Tainted: G                 N 6.7.0-g244ee3389ffe #1
> > [   45.505547] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
> > [   45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
> > [ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11
> > All code
> > ========
> >    0: 0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
> >    7: 00
> >    8: 0f 1f 40 00             nopl   0x0(%rax)
> >    c: 0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
> >   11: 55                      push   %rbp
> >   12: 48 89 fd                mov    %rdi,%rbp
> >   15: 48 83 ec 20             sub    $0x20,%rsp
> >   19: 65 48 8b 04 25 28 00    mov    %gs:0x28,%rax
> >   20: 00 00
> >   22: 48 89 44 24 18          mov    %rax,0x18(%rsp)
> >   27: 31 c0                   xor    %eax,%eax
> >   29:*        e9 c9 00 00 00          jmp    0xf7             <-- trapping instruction
> >   2e: 66 90                   xchg   %ax,%ax
> >   30: 66 90                   xchg   %ax,%ax
> >   32: 48 8d 54 24 10          lea    0x10(%rsp),%rdx
> >   37: 48 89 ef                mov    %rbp,%rdi
> >   3a: 65                      gs
> >   3b: 8b                      .byte 0x8b
> >   3c: 35                      .byte 0x35
> >   3d: 17                      (bad)
> >   3e: 9d                      popf
> >   3f: 11                      .byte 0x11
> >
> > Code starting with the faulting instruction
> > ===========================================
> >    0: c9                      leave
> >    1: 00 00                   add    %al,(%rax)
> >    3: 00 66 90                add    %ah,-0x70(%rsi)
> >    6: 66 90                   xchg   %ax,%ax
> >    8: 48 8d 54 24 10          lea    0x10(%rsp),%rdx
> >    d: 48 89 ef                mov    %rbp,%rdi
> >   10: 65                      gs
> >   11: 8b                      .byte 0x8b
> >   12: 35                      .byte 0x35
> >   13: 17                      (bad)
> >   14: 9d                      popf
> >   15: 11                      .byte 0x11
> > [   45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246
> > [   45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000
> > [   45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400
> > [   45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068
> > [   45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400
> > [   45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000
> > [   45.505547] FS:  00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000
> > [   45.505547] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0
> > [   45.505547] Call Trace:
> > [   45.505547]  <IRQ>
> > [   45.505547] ? die (arch/x86/kernel/dumpstack.c:421)
> > [   45.505547] ? exc_int3 (arch/x86/kernel/traps.c:762)
> > [   45.505547] ? asm_exc_int3 (arch/x86/include/asm/idtentry.h:569)
> > [   45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
> > [   45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
> > [   45.505547] __netif_rx (net/core/dev.c:5084)
> > [   45.505547] veth_xmit (drivers/net/veth.c:321)
> > [   45.505547] dev_hard_start_xmit (include/linux/netdevice.h:4989)
> > [   45.505547] __dev_queue_xmit (include/linux/netdevice.h:3367)
> > [   45.505547] ? selinux_ip_postroute_compat (security/selinux/hooks.c:5783)
> > [   45.505547] ? eth_header (net/ethernet/eth.c:85)
> > [   45.505547] ip6_finish_output2 (include/net/neighbour.h:542)
> > [   45.505547] ? ip6_output (include/linux/netfilter.h:301)
> > [   45.505547] ? ip6_mtu (net/ipv6/route.c:3208)
> > [   45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953)
> > [   45.505547] icmpv6_echo_reply (net/ipv6/icmp.c:812)
> > [   45.505547] ? icmpv6_rcv (net/ipv6/icmp.c:939)
> > [   45.505547] icmpv6_rcv (net/ipv6/icmp.c:939)
> > [   45.505547] ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:440)
> > [   45.505547] ip6_input_finish (include/linux/rcupdate.h:779)
> > [   45.505547] __netif_receive_skb_one_core (net/core/dev.c:5537)
> > [   45.505547] process_backlog (include/linux/rcupdate.h:779)
> > [   45.505547] __napi_poll (net/core/dev.c:6576)
> > [   45.505547] net_rx_action (net/core/dev.c:6647)
> > [   45.505547] __do_softirq (arch/x86/include/asm/jump_label.h:27)
> > [   45.505547] do_softirq (kernel/softirq.c:454)
> > [   45.505547]  </IRQ>
> > [   45.505547]  <TASK>
> > [   45.505547] __local_bh_enable_ip (kernel/softirq.c:381)
> > [   45.505547] __dev_queue_xmit (net/core/dev.c:4379)
> > [   45.505547] ip6_finish_output2 (include/linux/netdevice.h:3171)
> > [   45.505547] ? ip6_output (include/linux/netfilter.h:301)
> > [   45.505547] ? ip6_mtu (net/ipv6/route.c:3208)
> > [   45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953)
> > [   45.505547] rawv6_sendmsg (net/ipv6/raw.c:584)
> > [   45.505547] ? netfs_clear_subrequests (include/linux/list.h:373)
> > [   45.505547] ? netfs_alloc_request (fs/netfs/objects.c:42)
> > [   45.505547] ? folio_add_file_rmap_ptes (arch/x86/include/asm/bitops.h:206)
> > [   45.505547] ? set_pte_range (mm/memory.c:4529)
> > [   45.505547] ? next_uptodate_folio (include/linux/xarray.h:1699)
> > [   45.505547] ? __sock_sendmsg (net/socket.c:733)
> > [   45.505547] __sock_sendmsg (net/socket.c:733)
> > [   45.505547] ? move_addr_to_kernel.part.0 (net/socket.c:253)
> > [   45.505547] __sys_sendto (net/socket.c:2191)
> > [   45.505547] ? __hrtimer_run_queues (include/linux/seqlock.h:566)
> > [   45.505547] ? __do_softirq (arch/x86/include/asm/jump_label.h:27)
> > [   45.505547] __x64_sys_sendto (net/socket.c:2203)
> > [   45.505547] do_syscall_64 (arch/x86/entry/common.c:52)
> > [   45.505547] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
> > [   45.505547] RIP: 0033:0x7fa1d099ca0a
> > [ 45.505547] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
> > All code
> > ========
> >    0: d8 64 89 02             fsubs  0x2(%rcx,%rcx,4)
> >    4: 48 c7 c0 ff ff ff ff    mov    $0xffffffffffffffff,%rax
> >    b: eb b8                   jmp    0xffffffffffffffc5
> >    d: 0f 1f 00                nopl   (%rax)
> >   10: f3 0f 1e fa             endbr64
> >   14: 41 89 ca                mov    %ecx,%r10d
> >   17: 64 8b 04 25 18 00 00    mov    %fs:0x18,%eax
> >   1e: 00
> >   1f: 85 c0                   test   %eax,%eax
> >   21: 75 15                   jne    0x38
> >   23: b8 2c 00 00 00          mov    $0x2c,%eax
> >   28: 0f 05                   syscall
> >   2a:*        48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax         <-- trapping instruction
> >   30: 77 7e                   ja     0xb0
> >   32: c3                      ret
> >   33: 0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
> >   38: 41 54                   push   %r12
> >   3a: 48 83 ec 30             sub    $0x30,%rsp
> >   3e: 44                      rex.R
> >   3f: 89                      .byte 0x89
> >
> > Code starting with the faulting instruction
> > ===========================================
> >    0: 48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax
> >    6: 77 7e                   ja     0x86
> >    8: c3                      ret
> >    9: 0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
> >    e: 41 54                   push   %r12
> >   10: 48 83 ec 30             sub    $0x30,%rsp
> >   14: 44                      rex.R
> >   15: 89                      .byte 0x89
> > [   45.505547] RSP: 002b:00007ffe47710958 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
> > [   45.505547] RAX: ffffffffffffffda RBX: 00007ffe47712090 RCX: 00007fa1d099ca0a
> > [   45.505547] RDX: 0000000000000040 RSI: 0000559b91bbd300 RDI: 0000000000000003
> > [   45.505547] RBP: 0000559b91bbd300 R08: 00007ffe477142a4 R09: 000000000000001c
> > [   45.505547] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe47711c20
> > [   45.505547] R13: 0000000000000040 R14: 0000559b91bbf4f4 R15: 00007ffe47712090
> > [   45.505547]  </TASK>
> > [   45.505547] Modules linked in: mptcp_diag inet_diag mptcp_token_test mptcp_crypto_test kunit
> > [   45.505547] ---[ end trace 0000000000000000 ]---
> > [   45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
> > [ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11
> > All code
> > ========
> >    0: 0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
> >    7: 00
> >    8: 0f 1f 40 00             nopl   0x0(%rax)
> >    c: 0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
> >   11: 55                      push   %rbp
> >   12: 48 89 fd                mov    %rdi,%rbp
> >   15: 48 83 ec 20             sub    $0x20,%rsp
> >   19: 65 48 8b 04 25 28 00    mov    %gs:0x28,%rax
> >   20: 00 00
> >   22: 48 89 44 24 18          mov    %rax,0x18(%rsp)
> >   27: 31 c0                   xor    %eax,%eax
> >   29:*        e9 c9 00 00 00          jmp    0xf7             <-- trapping instruction
> >   2e: 66 90                   xchg   %ax,%ax
> >   30: 66 90                   xchg   %ax,%ax
> >   32: 48 8d 54 24 10          lea    0x10(%rsp),%rdx
> >   37: 48 89 ef                mov    %rbp,%rdi
> >   3a: 65                      gs
> >   3b: 8b                      .byte 0x8b
> >   3c: 35                      .byte 0x35
> >   3d: 17                      (bad)
> >   3e: 9d                      popf
> >   3f: 11                      .byte 0x11
> >
> > Code starting with the faulting instruction
> > ===========================================
> >    0: c9                      leave
> >    1: 00 00                   add    %al,(%rax)
> >    3: 00 66 90                add    %ah,-0x70(%rsi)
> >    6: 66 90                   xchg   %ax,%ax
> >    8: 48 8d 54 24 10          lea    0x10(%rsp),%rdx
> >    d: 48 89 ef                mov    %rbp,%rdi
> >   10: 65                      gs
> >   11: 8b                      .byte 0x8b
> >   12: 35                      .byte 0x35
> >   13: 17                      (bad)
> >   14: 9d                      popf
> >   15: 11                      .byte 0x11
> > [   45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246
> > [   45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000
> > [   45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400
> > [   45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068
> > [   45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400
> > [   45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000
> > [   45.505547] FS:  00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000
> > [   45.505547] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0
> > [   45.505547] Kernel panic - not syncing: Fatal exception in interrupt
> > [   45.505547] Kernel Offset: 0x37600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>
>
> When hitting the panic, the MPTCP selftest was doing some pings -- v6
> according to the call trace -- between different netns: client, server,
> 2 routers in between with some TC config. See [1] for more details. In
> other words, that's before creating MPTCP connections.
>
> These panics are not easy to reproduce. In fact, we only saw the issue 2
> (maybe 3) times, only when running on Github Actions (without KVM). I
> didn't manage to reproduce it locally.
>
> It is only recently that we have started to use Github Actions to do
> some validations, so I cannot confirm that it is a very recent issue. I
> think the CI hit the same issue a few days ago, on top of bec161add35b
> ("amt: do not use overwrapped cb area"), but there was another issue and
> the debug info have not been stored.
>
> For reference, I originally added info in a Github issue [2]. If the CI
> hits the same bug again, I will add stacktrace there. Please tell me if
> I should cc someone.
>
> If you have any idea what is causing such panic, please tell me. I can
> also add test patches in the MPTCP tree if needed.
>
>
>
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/tree/tools/testing/selftests/net/mptcp/mptcp_connect.sh?id=e327b2372bc0#n171
>
> [2]
> https://github.com/multipath-tcp/mptcp_net-next/issues/471#issuecomment-1894061756
>
>
> Cheers,
> Matt
> --
> Sponsored by the NGI0 Core fund.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ