lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 9 Jan 2018 15:49:21 +0100
From:   Tobias Hommel <netdev-list@...oetigt.de>
To:     Steffen Klassert <steffen.klassert@...unet.com>
Cc:     netdev@...r.kernel.org
Subject: Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in
 xfrm_lookup

On Tue, Jan 09, 2018 at 10:26:24AM +0100, Steffen Klassert wrote:
> On Tue, Jan 09, 2018 at 10:06:51AM +0100, Tobias Hommel wrote:
> > > 
> > > You have CONFIG_INET_ESP_OFFLOAD enabled, this is new maybe it
> > > still has some problems. You should not hit an offload codepath
> > > because all your SAs are configured with UDP encapsulation which
> > > is still not supported with offload.
> > > 
> > > Please try to disable GRO on both interfaces and see what happens:
> > > 
> > > ethtool -K eth0 gro off
> > > ethtool -K eth1 gro off
> > I actually already tried that with only eth1 off, to verify I turned offloading
> > off for both interfaces. The same problem: see attached panic.gro_off.log
> > 
> > > 
> > > Then disable CONFIG_INET_ESP_OFFLOAD and try again.
> > Rebuild with CONFIG_INET_ESP_OFFLOAD disabled, same problem: see attached
> > panic.esp_offload_disabled.log
> 
> So ESP offload is not the problem. Next thing that comes to my mind
> is the flowcache removal, this was introduced with v4.14.
> 
> > 
> > > 
> > > This should show us if this feature is responsible for the bug.
> > > 
> > 
> > I will try narrowing down the problem by trying out some older kernels for now.
> 
> Thanks!
> 
> Let me know about the results.

I copied the config from my 4.14.12 sources to a fresh 4.13.16 source tree, ran
`make olddefconfig` and built a new kernel.
The kernel config is attached as kernel-4.13.16.config.
The panic*.log files are kernel logs from different crashes of this 4.13.16
kernel, but all from the same scenario as before.
I also enabled CONFIG_DEBUG_INFO, so if any disassemblies are required, I'd be
happy to provide them.

So, the system still crashes, but the traces are completely different from
those with 4.14.12. This time there are also WARNINGs and "refcnt: -1" messages
sometimes before the actual panic, so not sure if there is maybe some other
problem. Still, the crashes all seem to be related to ip routing somehow.

View attachment "kernel-4.13.16.config" of type "text/plain" (102142 bytes)

View attachment "panic1.log" of type "text/plain" (10411 bytes)

View attachment "panic2.log" of type "text/plain" (10456 bytes)

View attachment "panic3.log" of type "text/plain" (6143 bytes)

View attachment "panic4.log" of type "text/plain" (3988 bytes)

View attachment "panic5.log" of type "text/plain" (3776 bytes)

View attachment "panic6.log" of type "text/plain" (16884 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ