[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180109081939.qs2nrkuvmi3lw2dl@gauss3.secunet.de>
Date: Tue, 9 Jan 2018 09:19:39 +0100
From: Steffen Klassert <steffen.klassert@...unet.com>
To: Tobias Hommel <netdev-list@...oetigt.de>
CC: <netdev@...r.kernel.org>
Subject: Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in
xfrm_lookup
On Mon, Jan 08, 2018 at 02:53:48PM +0100, Tobias Hommel wrote:
...
> [ 439.095554] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
> [ 439.103664] IP: xfrm_lookup+0x2a/0x7d0
> [ 439.107551] PGD 0 P4D 0
> [ 439.110144] Oops: 0000 [#1] SMP PTI
> [ 439.113653] Modules linked in:
> [ 439.116774] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 4.14.12 #1
> [ 439.122900] Hardware name: To be filled by O.E.M. CAR-2051/CAR, BIOS 1.01 07/11/2016
> [ 439.130769] task: ffff8cf33b0ea280 task.stack: ffff9492c0090000
> [ 439.136726] RIP: 0010:xfrm_lookup+0x2a/0x7d0
> [ 439.141005] RSP: 0018:ffff8cf33fd83bd0 EFLAGS: 00010246
> [ 439.146315] RAX: 0000000000000000 RBX: ffffffff87074080 RCX: 0000000000000000
> [ 439.153537] RDX: ffff8cf33fd83c48 RSI: 0000000000000000 RDI: ffffffff87074080
> [ 439.160780] RBP: ffffffff87074080 R08: 0000000000000002 R09: 0000000000000000
> [ 439.167958] R10: 0000000000000020 R11: 0000000000000020 R12: ffff8cf33fd83c48
> [ 439.175115] R13: 0000000000000000 R14: 0000000000000002 R15: ffff8cf33b240078
> [ 439.182337] FS: 0000000000000000(0000) GS:ffff8cf33fd80000(0000) knlGS:0000000000000000
> [ 439.190456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 439.196227] CR2: 0000000000000020 CR3: 000000013200a000 CR4: 00000000001006e0
> [ 439.203386] Call Trace:
> [ 439.205869] <IRQ>
> [ 439.207886] __xfrm_route_forward+0xa4/0x110
> [ 439.212195] ip_forward+0x3da/0x450
> [ 439.215696] ? ip_rcv_finish+0x61/0x390
> [ 439.219542] ip_rcv+0x2b5/0x380
> [ 439.222716] ? inet_del_offload+0x30/0x30
> [ 439.226736] __netif_receive_skb_core+0x751/0xb00
> [ 439.231469] ? netif_receive_skb_internal+0x47/0xf0
> [ 439.236391] netif_receive_skb_internal+0x47/0xf0
> [ 439.241150] napi_gro_flush+0x50/0x70
> [ 439.244831] napi_complete_done+0x90/0xd0
> [ 439.248872] igb_poll+0x8fd/0xe80
> [ 439.252190] net_rx_action+0x1fc/0x310
> [ 439.255978] __do_softirq+0xd5/0x1cf
> [ 439.259584] irq_exit+0xa3/0xb0
> [ 439.262763] do_IRQ+0x45/0xc0
> [ 439.265772] common_interrupt+0x95/0x95
> [ 439.269609] </IRQ>
> [ 439.271733] RIP: 0010:cpuidle_enter_state+0x120/0x200
> [ 439.276810] RSP: 0018:ffff9492c0093eb8 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff5d
> [ 439.284436] RAX: ffff8cf33fd9ea80 RBX: 0000000000000002 RCX: 000000663c21ea0f
> [ 439.291604] RDX: 0000000000000000 RSI: 00000000355556ca RDI: 0000000000000000
> [ 439.298772] RBP: ffff8cf33fda71e8 R08: 0000000000000003 R09: 0000000000000018
> [ 439.305930] R10: 00000000ffffffff R11: 000000000000057c R12: 000000663c21ea0f
> [ 439.313089] R13: 000000663c1c6c33 R14: 0000000000000002 R15: 0000000000000000
> [ 439.320259] ? cpuidle_enter_state+0x11c/0x200
> [ 439.324740] do_idle+0xd6/0x170
> [ 439.327885] cpu_startup_entry+0x67/0x70
> [ 439.331837] start_secondary+0x167/0x190
> [ 439.335788] secondary_startup_64+0xa5/0xb0
> [ 439.340001] Code: 00 41 57 41 56 45 89 c6 41 55 41 54 49 89 f5 55 53 49 89 d4 48 89 fb 48 83 ec 40 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 <48> 8b 46 20 48 85 c9 44 0f b7 38 c7 44 24 0c 00 00 00 00 0f 84
> [ 439.358988] RIP: xfrm_lookup+0x2a/0x7d0 RSP: ffff8cf33fd83bd0
> [ 439.364759] CR2: 0000000000000020
> [ 439.368105] ---[ end trace c6b298b556ea7769 ]---
> [ 439.372752] Kernel panic - not syncing: Fatal exception in interrupt
> [ 439.379255] Kernel Offset: 0x5000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 439.390029] Rebooting in 10 seconds..
...
> 0000000000004230 <xfrm_lookup>:
> 4230: 41 57 push %r15
> 4232: 41 56 push %r14
> 4234: 45 89 c6 mov %r8d,%r14d
> 4237: 41 55 push %r13
> 4239: 41 54 push %r12
> 423b: 49 89 f5 mov %rsi,%r13
> 423e: 55 push %rbp
> 423f: 53 push %rbx
> 4240: 49 89 d4 mov %rdx,%r12
> 4243: 48 89 fb mov %rdi,%rbx
> 4246: 48 83 ec 40 sub $0x40,%rsp
> 424a: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax
> 4251: 00 00
> 4253: 48 89 44 24 38 mov %rax,0x38(%rsp)
> 4258: 31 c0 xor %eax,%eax
> 425a: 48 8b 46 20 mov 0x20(%rsi),%rax
The above is the failing instruction, RSI holds the second argument
of the called function which is a NULL pointer. The second argument
of xfrm_lookup() is dst_orig, so it is as I thought. Now let's find
out why. I don't see anything obvious, so we need to narrow it down.
> CONFIG_INET_ESP=y
> CONFIG_INET_ESP_OFFLOAD=y
You have CONFIG_INET_ESP_OFFLOAD enabled, this is new maybe it
still has some problems. You should not hit an offload codepath
because all your SAs are configured with UDP encapsulation which
is still not supported with offload.
Please try to disable GRO on both interfaces and see what happens:
ethtool -K eth0 gro off
ethtool -K eth1 gro off
Then disable CONFIG_INET_ESP_OFFLOAD and try again.
This should show us if this feature is responsible for the bug.
Powered by blists - more mailing lists