[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1041011515187631@web26g.yandex.ru>
Date: Sat, 06 Jan 2018 00:27:11 +0300
From: Ozgur <ozgur@...sey.org>
To: Tobias Hommel <netdev-list@...oetigt.de>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in xfrm_lookup
06.01.2018, 00:20, "Tobias Hommel" <netdev-list@...oetigt.de>:
> Hi,
Hi Tobias,
> I'm running into a NULL pointer dereference after updating from Linux 4.1.6 to
> 4.14.11 (see kernel log below). I tried 4.14.3 initially which did not work
> either.
> Anyone has an idea what is happening here?
>
> The affected machine has 2 active ethernet interfaces (igb driver) and acts as
> a VPN gateway running strongswan. There are several hundreds of IPSec
> roadwarriors connecting to eth1. eth0 connects to an infrastructure running an
> HTTP server.
> During my tests these roadwarriors connect to the gateway, sometimes download a
> large file from the HTTP server, disconnect and after a random delay repeat
> these steps.
>
> Some observations I made:
> * SMP Affinity for IRQs of the NICs Rx/Tx queues (/proc/irq/$IRQ/smp_affinity)
> * all affinities set to default ff is broken
> * setting affinity for all queues of both interfaces to the same CPU seems to
> work fine (running stable for more than 1 day now)
> * setting affinity of eth0 queues to CPU 1 and affinity of eth1 queues to CPU
> 2 is broken and seems to always trigger the bug on CPU 1
> * the top 6 entries of the call trace are the same every time the system
> crashes, the other entries differ sometimes
>
> The bug is 100% reproducible on the Intel Atom machine from the log below and
> also on a HP ProLiant Gen6 (also igb driver).
> I can, of course, provide further information (CPU, NIC, kernel config, more
> traces, etc.) if required.
> If helpful I could also run tests on HP ProLiant Gen9 which has different NICs
> (tg3).
>
> [ 7998.489094] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
> [ 7998.496993] IP: xfrm_lookup+0x2a/0x7e0
> [ 7998.500759] PGD 0 P4D 0
> [ 7998.503316] Oops: 0000 [#1] SMP PTI
> [ 7998.506835] Modules linked in:
> [ 7998.509929] CPU: 2 PID: 22 Comm: ksoftirqd/2 Not tainted 4.14.11 #3
> [ 7998.516244] Hardware name: To be filled by O.E.M. CAR-2051/CAR, BIOS 1.01 07/11/2016
> [ 7998.524039] task: ffff8826bb118000 task.stack: ffff947ac00f0000
> [ 7998.530004] RIP: 0010:xfrm_lookup+0x2a/0x7e0
> [ 7998.534298] RSP: 0018:ffff947ac00f3b60 EFLAGS: 00010246
> [ 7998.539550] RAX: 0000000000000000 RBX: ffffffff93074040 RCX: 0000000000000000
> [ 7998.546709] RDX: ffff947ac00f3bd8 RSI: 0000000000000000 RDI: ffffffff93074040
> [ 7998.553868] RBP: ffffffff93074040 R08: 0000000000000002 R09: 0000000000000001
> [ 7998.561026] R10: 0000000000000032 R11: 0000000000000000 R12: ffff947ac00f3bd8
> [ 7998.568212] R13: 0000000000000000 R14: 0000000000000002 R15: ffff8826b69a8078
> [ 7998.575395] FS: 0000000000000000(0000) GS:ffff8826bfc80000(0000) knlGS:0000000000000000
> [ 7998.583550] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 7998.589324] CR2: 0000000000000020 CR3: 00000001781da000 CR4: 00000000001006e0
> [ 7998.596482] Call Trace:
> [ 7998.598959] __xfrm_route_forward+0xa4/0x110
> [ 7998.603263] ip_forward+0x3e0/0x450
> [ 7998.606778] ? ip_rcv_finish+0x61/0x3a0
> [ 7998.610645] ip_rcv+0x2c4/0x390
> [ 7998.613818] ? inet_del_offload+0x30/0x30
> [ 7998.617857] __netif_receive_skb_core+0x751/0xb00
> [ 7998.622562] ? skb_send_sock+0x40/0x40
> [ 7998.626356] ? netif_receive_skb_internal+0x47/0xf0
> [ 7998.631252] netif_receive_skb_internal+0x47/0xf0
> [ 7998.635987] napi_gro_receive+0x70/0x90
> [ 7998.639835] gro_cell_poll+0x53/0x90
> [ 7998.643439] net_rx_action+0x1fc/0x310
> [ 7998.647210] ? rebalance_domains+0x101/0x2b0
> [ 7998.651500] __do_softirq+0xd5/0x1cf
> [ 7998.655105] run_ksoftirqd+0x14/0x30
> [ 7998.658712] smpboot_thread_fn+0xf9/0x150
> [ 7998.662723] kthread+0xef/0x130
> [ 7998.665893] ? sort_range+0x20/0x20
> [ 7998.669404] ? kthread_park+0x60/0x60
> [ 7998.673098] ret_from_fork+0x1f/0x30
> [ 7998.676674] Code: 00 41 57 41 56 45 89 c6 41 55 41 54 49 89 f5 55 53 49 89 d4 48 89 fb 48 83 ec 40 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 <48> 8b 46 20 48 85 c9 44 0f b7 38 c7 44 24 0c 00 00 00 00 0f 84
> [ 7998.695681] RIP: xfrm_lookup+0x2a/0x7e0 RSP: ffff947ac00f3b60
> [ 7998.701479] CR2: 0000000000000020
> [ 7998.704799] ---[ end trace 0544b1946919baad ]---
> [ 7998.709442] Kernel panic - not syncing: Fatal exception in interrupt
> [ 7998.715918] Kernel Offset: 0x11000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
this error doesn't look like the last version kernel, I think this problem NIC driver.
What is the use network ethernet card model?
And which driver version you use?
> Best regards,
>
> Tobias Hommel
Ozgur
Powered by blists - more mailing lists