netdev - Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in xfrm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Thu, 14 Jun 2018 10:38:01 +0200
From:   Kristian Evensen <kristian.evensen@...il.com>
To:     Steffen Klassert <steffen.klassert@...unet.com>
Cc:     Tobias Hommel <netdev-list@...oetigt.de>,
        Markus Berner <Markus.Berner@....ch>,
        Network Development <netdev@...r.kernel.org>,
        Florian Westphal <fw@...len.de>
Subject: Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in xfrm_lookup

Hello,

On Tue, Jun 12, 2018 at 10:29 AM, Kristian Evensen
<kristian.evensen@...il.com> wrote:
> Thanks for spending time on this. I will see what I can manage in
> terms of a bisect. Our last good kernel was 4.9, so at least it
> narrows the scope down a bit compared to 4.4 or 4.1.

I hope we might have got somewhere. While looking more into ipsec and
4.14, we noticed large performance regressions (-~20%) on some
low-powered devices we are also using. We quickly identified the
removal of the flow cache as the "culprit", and the performance
regression is discussed in the netdev-thread for the removal of the
cache ("xfrm: remove flow cache"). For the time being and in order to
restore the performance, we have reverted the patch series removing
the flow cache. When running our tests (on the APU) after the revert,
we no longer see the crash. Before the revert, the APU would always
crash within some hours. After the revert, our tests have been running
for 24 hours+. Our test is quite basic, we establish 1, 2, 3 ...,  50
tunnels and then run iperf on all tunnels in parallel. The tunnels are
teared down between each iteration.

We are still running the test and will keep doing so, but I thought I
should share this finding in case it can help in fixing the error. I
will report back in case we find out something more, and please let me
know if you have any suggestions for things I can test. I don't for
example know if it is safe to revert one and one commit of the flow
cache, to try to pin the crash even more down.

BR,
Kristian