[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180109144921.pcrbsgvlrmdhreqv@arbeitstier>
Date: Tue, 9 Jan 2018 15:49:21 +0100
From: Tobias Hommel <netdev-list@...oetigt.de>
To: Steffen Klassert <steffen.klassert@...unet.com>
Cc: netdev@...r.kernel.org
Subject: Re: BUG: 4.14.11 unable to handle kernel NULL pointer dereference in
xfrm_lookup
On Tue, Jan 09, 2018 at 10:26:24AM +0100, Steffen Klassert wrote:
> On Tue, Jan 09, 2018 at 10:06:51AM +0100, Tobias Hommel wrote:
> > >
> > > You have CONFIG_INET_ESP_OFFLOAD enabled, this is new maybe it
> > > still has some problems. You should not hit an offload codepath
> > > because all your SAs are configured with UDP encapsulation which
> > > is still not supported with offload.
> > >
> > > Please try to disable GRO on both interfaces and see what happens:
> > >
> > > ethtool -K eth0 gro off
> > > ethtool -K eth1 gro off
> > I actually already tried that with only eth1 off, to verify I turned offloading
> > off for both interfaces. The same problem: see attached panic.gro_off.log
> >
> > >
> > > Then disable CONFIG_INET_ESP_OFFLOAD and try again.
> > Rebuild with CONFIG_INET_ESP_OFFLOAD disabled, same problem: see attached
> > panic.esp_offload_disabled.log
>
> So ESP offload is not the problem. Next thing that comes to my mind
> is the flowcache removal, this was introduced with v4.14.
>
> >
> > >
> > > This should show us if this feature is responsible for the bug.
> > >
> >
> > I will try narrowing down the problem by trying out some older kernels for now.
>
> Thanks!
>
> Let me know about the results.
I copied the config from my 4.14.12 sources to a fresh 4.13.16 source tree, ran
`make olddefconfig` and built a new kernel.
The kernel config is attached as kernel-4.13.16.config.
The panic*.log files are kernel logs from different crashes of this 4.13.16
kernel, but all from the same scenario as before.
I also enabled CONFIG_DEBUG_INFO, so if any disassemblies are required, I'd be
happy to provide them.
So, the system still crashes, but the traces are completely different from
those with 4.14.12. This time there are also WARNINGs and "refcnt: -1" messages
sometimes before the actual panic, so not sure if there is maybe some other
problem. Still, the crashes all seem to be related to ip routing somehow.
View attachment "kernel-4.13.16.config" of type "text/plain" (102142 bytes)
View attachment "panic1.log" of type "text/plain" (10411 bytes)
View attachment "panic2.log" of type "text/plain" (10456 bytes)
View attachment "panic3.log" of type "text/plain" (6143 bytes)
View attachment "panic4.log" of type "text/plain" (3988 bytes)
View attachment "panic5.log" of type "text/plain" (3776 bytes)
View attachment "panic6.log" of type "text/plain" (16884 bytes)
Powered by blists - more mailing lists