[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180108123608.tkqe2bhjzcw3ya2y@gauss3.secunet.de>
Date: Mon, 8 Jan 2018 13:36:08 +0100
From: Steffen Klassert <steffen.klassert@...unet.com>
To: Tobias Hommel <netdev-list@...oetigt.de>
CC: <netdev@...r.kernel.org>
Subject: Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in
xfrm_lookup
On Fri, Jan 05, 2018 at 10:13:23PM +0100, Tobias Hommel wrote:
> Hi,
>
> I'm running into a NULL pointer dereference after updating from Linux 4.1.6 to
> 4.14.11 (see kernel log below). I tried 4.14.3 initially which did not work
> either.
> Anyone has an idea what is happening here?
>
> The affected machine has 2 active ethernet interfaces (igb driver) and acts as
> a VPN gateway running strongswan. There are several hundreds of IPSec
> roadwarriors connecting to eth1. eth0 connects to an infrastructure running an
> HTTP server.
> During my tests these roadwarriors connect to the gateway, sometimes download a
> large file from the HTTP server, disconnect and after a random delay repeat
> these steps.
>
> Some observations I made:
> * SMP Affinity for IRQs of the NICs Rx/Tx queues (/proc/irq/$IRQ/smp_affinity)
> * all affinities set to default ff is broken
> * setting affinity for all queues of both interfaces to the same CPU seems to
> work fine (running stable for more than 1 day now)
> * setting affinity of eth0 queues to CPU 1 and affinity of eth1 queues to CPU
> 2 is broken and seems to always trigger the bug on CPU 1
> * the top 6 entries of the call trace are the same every time the system
> crashes, the other entries differ sometimes
>
> The bug is 100% reproducible on the Intel Atom machine from the log below and
> also on a HP ProLiant Gen6 (also igb driver).
> I can, of course, provide further information (CPU, NIC, kernel config, more
> traces, etc.) if required.
> If helpful I could also run tests on HP ProLiant Gen9 which has different NICs
> (tg3).
>
> [ 7998.489094] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
> [ 7998.496993] IP: xfrm_lookup+0x2a/0x7e0
xfrm_lookup+0x2a is at the very beginning of xfrm_lookup(), here we
find:
u16 family = dst_orig->ops->family;
ops has an offset of 32 bytes (20 hex) in dst_orig, so looks like
dst_orig is NULL.
In the forwarding case, we get dst_orig from the skb and dst_orig
can't be NULL here unless the skb itself is already fishy.
Can you provide the following informations:
- Your kernel config
- The output of 'ip x p' and 'ip x s'
- An object dump of xfrm_policy.o if possible 'objdump -d -S net/xfrm/xfrm_policy.o'
(The path to xfrm_policy.o depends on how you build your kernels)
Powered by blists - more mailing lists