lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfDRXjMYtAFKkv9+S-=_Rug3w1AiKeu0eRyEDBhXvau_91UaQ@mail.gmail.com>
Date:   Thu, 14 Jun 2018 10:38:01 +0200
From:   Kristian Evensen <kristian.evensen@...il.com>
To:     Steffen Klassert <steffen.klassert@...unet.com>
Cc:     Tobias Hommel <netdev-list@...oetigt.de>,
        Markus Berner <Markus.Berner@....ch>,
        Network Development <netdev@...r.kernel.org>,
        Florian Westphal <fw@...len.de>
Subject: Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in xfrm_lookup

Hello,

On Tue, Jun 12, 2018 at 10:29 AM, Kristian Evensen
<kristian.evensen@...il.com> wrote:
> Thanks for spending time on this. I will see what I can manage in
> terms of a bisect. Our last good kernel was 4.9, so at least it
> narrows the scope down a bit compared to 4.4 or 4.1.

I hope we might have got somewhere. While looking more into ipsec and
4.14, we noticed large performance regressions (-~20%) on some
low-powered devices we are also using. We quickly identified the
removal of the flow cache as the "culprit", and the performance
regression is discussed in the netdev-thread for the removal of the
cache ("xfrm: remove flow cache"). For the time being and in order to
restore the performance, we have reverted the patch series removing
the flow cache. When running our tests (on the APU) after the revert,
we no longer see the crash. Before the revert, the APU would always
crash within some hours. After the revert, our tests have been running
for 24 hours+. Our test is quite basic, we establish 1, 2, 3 ...,  50
tunnels and then run iperf on all tunnels in parallel. The tunnels are
teared down between each iteration.

We are still running the test and will keep doing so, but I thought I
should share this finding in case it can help in fixing the error. I
will report back in case we find out something more, and please let me
know if you have any suggestions for things I can test. I don't for
example know if it is safe to revert one and one commit of the flow
cache, to try to pin the crash even more down.

BR,
Kristian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ