netdev - Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in xfrm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180612183900.7fp4np57kggfhwpi@delI>
Date:   Tue, 12 Jun 2018 20:39:00 +0200
From:   Tobias Hommel <netdev-list@...oetigt.de>
To:     Kristian Evensen <kristian.evensen@...il.com>
Cc:     Steffen Klassert <steffen.klassert@...unet.com>,
        Markus Berner <Markus.Berner@....ch>,
        Network Development <netdev@...r.kernel.org>
Subject: Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in
 xfrm_lookup

On Fri, Jun 08, 2018 at 10:41:37AM +0200, Kristian Evensen wrote:
> Hi,
> 
> On Wed, Jun 6, 2018 at 6:03 PM, Tobias Hommel <netdev-list@...oetigt.de> wrote:
> > Sorry no progress until now, I currently do not get time to have a deeper look
> > into that. We're back to 4.1.6 right now.
> 
> Thanks for letting me know. In the project I am currently involved in,
> we unfortunately don't have the option of reverting the kernel, so we
> are finding ways to live with the error. We have been looking into the
> error a bit more, and have made the following observations:
> 
> * First of all, as discussed earlier in the thread, the error is
> triggered by dst_orig being NULL. Our current work-around is just to
> return from xfrm_lookup if dst_orig is NULL and this seems to work
> fine, the error doesn't happen that often (in our use-cases at least).
> * The machine we use for testing (and where we first saw the error) is
> used as initiator.
The machine where I encountered the bug is a "roadwarrior gateway", so it only
serves as a responder.

> * When we compare the logs from Strongswan with the ones from the
> kernel, it seems that the error is typically triggered when a tunnels
> is teared down/about to come up. We need quite a lot of tunnels for
> the error to trigger, usually around 30+. I guess this might point to
> some race or some condition not being met when packets are
> sent/received.
> * We see the error much more frequently when hardware encryption is enabled.
> * Yesterday, we upgraded the kernel from 4.14.34 to 4.14.48, and the
> error happens much less frequently. I see that 4.14.48 includes
> several IPsec fixes (for example the previously mentioned ("xfrm: Fix
> a race in the xdst pcpu cache.")).
> 
> BR,
> Kristian