[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180109090651.wuxcdju5ynlkq25l@arbeitstier>
Date: Tue, 9 Jan 2018 10:06:51 +0100
From: Tobias Hommel <netdev-list@...oetigt.de>
To: Steffen Klassert <steffen.klassert@...unet.com>
Cc: netdev@...r.kernel.org
Subject: Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in
xfrm_lookup
On Tue, Jan 09, 2018 at 09:19:39AM +0100, Steffen Klassert wrote:
> On Mon, Jan 08, 2018 at 02:53:48PM +0100, Tobias Hommel wrote:
>
> ...
>
> > [ 439.095554] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
> > [ 439.103664] IP: xfrm_lookup+0x2a/0x7d0
> > [ 439.107551] PGD 0 P4D 0
> > [ 439.110144] Oops: 0000 [#1] SMP PTI
> > [ 439.113653] Modules linked in:
> > [ 439.116774] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 4.14.12 #1
> > [ 439.122900] Hardware name: To be filled by O.E.M. CAR-2051/CAR, BIOS 1.01 07/11/2016
> > [ 439.130769] task: ffff8cf33b0ea280 task.stack: ffff9492c0090000
> > [ 439.136726] RIP: 0010:xfrm_lookup+0x2a/0x7d0
> > [ 439.141005] RSP: 0018:ffff8cf33fd83bd0 EFLAGS: 00010246
> > [ 439.146315] RAX: 0000000000000000 RBX: ffffffff87074080 RCX: 0000000000000000
> > [ 439.153537] RDX: ffff8cf33fd83c48 RSI: 0000000000000000 RDI: ffffffff87074080
> > [ 439.160780] RBP: ffffffff87074080 R08: 0000000000000002 R09: 0000000000000000
> > [ 439.167958] R10: 0000000000000020 R11: 0000000000000020 R12: ffff8cf33fd83c48
> > [ 439.175115] R13: 0000000000000000 R14: 0000000000000002 R15: ffff8cf33b240078
> > [ 439.182337] FS: 0000000000000000(0000) GS:ffff8cf33fd80000(0000) knlGS:0000000000000000
> > [ 439.190456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 439.196227] CR2: 0000000000000020 CR3: 000000013200a000 CR4: 00000000001006e0
> > [ 439.203386] Call Trace:
> > [ 439.205869] <IRQ>
> > [ 439.207886] __xfrm_route_forward+0xa4/0x110
> > [ 439.212195] ip_forward+0x3da/0x450
> > [ 439.215696] ? ip_rcv_finish+0x61/0x390
> > [ 439.219542] ip_rcv+0x2b5/0x380
> > [ 439.222716] ? inet_del_offload+0x30/0x30
> > [ 439.226736] __netif_receive_skb_core+0x751/0xb00
> > [ 439.231469] ? netif_receive_skb_internal+0x47/0xf0
> > [ 439.236391] netif_receive_skb_internal+0x47/0xf0
> > [ 439.241150] napi_gro_flush+0x50/0x70
> > [ 439.244831] napi_complete_done+0x90/0xd0
> > [ 439.248872] igb_poll+0x8fd/0xe80
> > [ 439.252190] net_rx_action+0x1fc/0x310
> > [ 439.255978] __do_softirq+0xd5/0x1cf
> > [ 439.259584] irq_exit+0xa3/0xb0
> > [ 439.262763] do_IRQ+0x45/0xc0
> > [ 439.265772] common_interrupt+0x95/0x95
> > [ 439.269609] </IRQ>
> > [ 439.271733] RIP: 0010:cpuidle_enter_state+0x120/0x200
> > [ 439.276810] RSP: 0018:ffff9492c0093eb8 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff5d
> > [ 439.284436] RAX: ffff8cf33fd9ea80 RBX: 0000000000000002 RCX: 000000663c21ea0f
> > [ 439.291604] RDX: 0000000000000000 RSI: 00000000355556ca RDI: 0000000000000000
> > [ 439.298772] RBP: ffff8cf33fda71e8 R08: 0000000000000003 R09: 0000000000000018
> > [ 439.305930] R10: 00000000ffffffff R11: 000000000000057c R12: 000000663c21ea0f
> > [ 439.313089] R13: 000000663c1c6c33 R14: 0000000000000002 R15: 0000000000000000
> > [ 439.320259] ? cpuidle_enter_state+0x11c/0x200
> > [ 439.324740] do_idle+0xd6/0x170
> > [ 439.327885] cpu_startup_entry+0x67/0x70
> > [ 439.331837] start_secondary+0x167/0x190
> > [ 439.335788] secondary_startup_64+0xa5/0xb0
> > [ 439.340001] Code: 00 41 57 41 56 45 89 c6 41 55 41 54 49 89 f5 55 53 49 89 d4 48 89 fb 48 83 ec 40 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 <48> 8b 46 20 48 85 c9 44 0f b7 38 c7 44 24 0c 00 00 00 00 0f 84
> > [ 439.358988] RIP: xfrm_lookup+0x2a/0x7d0 RSP: ffff8cf33fd83bd0
> > [ 439.364759] CR2: 0000000000000020
> > [ 439.368105] ---[ end trace c6b298b556ea7769 ]---
> > [ 439.372752] Kernel panic - not syncing: Fatal exception in interrupt
> > [ 439.379255] Kernel Offset: 0x5000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > [ 439.390029] Rebooting in 10 seconds..
>
> ...
>
> > 0000000000004230 <xfrm_lookup>:
> > 4230: 41 57 push %r15
> > 4232: 41 56 push %r14
> > 4234: 45 89 c6 mov %r8d,%r14d
> > 4237: 41 55 push %r13
> > 4239: 41 54 push %r12
> > 423b: 49 89 f5 mov %rsi,%r13
> > 423e: 55 push %rbp
> > 423f: 53 push %rbx
> > 4240: 49 89 d4 mov %rdx,%r12
> > 4243: 48 89 fb mov %rdi,%rbx
> > 4246: 48 83 ec 40 sub $0x40,%rsp
> > 424a: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax
> > 4251: 00 00
> > 4253: 48 89 44 24 38 mov %rax,0x38(%rsp)
> > 4258: 31 c0 xor %eax,%eax
> > 425a: 48 8b 46 20 mov 0x20(%rsi),%rax
>
>
> The above is the failing instruction, RSI holds the second argument
> of the called function which is a NULL pointer. The second argument
> of xfrm_lookup() is dst_orig, so it is as I thought. Now let's find
> out why. I don't see anything obvious, so we need to narrow it down.
>
> > CONFIG_INET_ESP=y
> > CONFIG_INET_ESP_OFFLOAD=y
>
> You have CONFIG_INET_ESP_OFFLOAD enabled, this is new maybe it
> still has some problems. You should not hit an offload codepath
> because all your SAs are configured with UDP encapsulation which
> is still not supported with offload.
>
> Please try to disable GRO on both interfaces and see what happens:
>
> ethtool -K eth0 gro off
> ethtool -K eth1 gro off
I actually already tried that with only eth1 off, to verify I turned offloading
off for both interfaces. The same problem: see attached panic.gro_off.log
>
> Then disable CONFIG_INET_ESP_OFFLOAD and try again.
Rebuild with CONFIG_INET_ESP_OFFLOAD disabled, same problem: see attached
panic.esp_offload_disabled.log
>
> This should show us if this feature is responsible for the bug.
>
I will try narrowing down the problem by trying out some older kernels for now.
View attachment "panic.esp_offload_disabled.log" of type "text/plain" (3344 bytes)
View attachment "panic.gro_off.log" of type "text/plain" (2712 bytes)
Powered by blists - more mailing lists