lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180912151823.z2wk7hnex4zxly3e@arbeitstier>
Date:   Wed, 12 Sep 2018 17:18:23 +0200
From:   Tobias Hommel <netdev-list@...oetigt.de>
To:     Steffen Klassert <steffen.klassert@...unet.com>
Cc:     Wolfgang Walter <linux@...m.de>,
        Kristian Evensen <kristian.evensen@...il.com>,
        Network Development <netdev@...r.kernel.org>,
        weiwan@...gle.com, edumazet@...gle.com
Subject: Re: kernels > v4.12 oops/crash with ipsec-traffic: bisected to
 b838d5e1c5b6e57b10ec8af2268824041e3ea911: ipv4: mark DST_NOGC and remove the
 operation of dst_free()

On Wed, Sep 12, 2018 at 10:50:46AM +0200, Steffen Klassert wrote:
> On Tue, Sep 11, 2018 at 09:02:48PM +0200, Tobias Hommel wrote:
> > > > Subject: [PATCH RFC] xfrm: Fix NULL pointer dereference when skb_dst_force
> > > > clears the dst_entry.
> > > > 
> > > > Since commit 222d7dbd258d ("net: prevent dst uses after free")
> > > > skb_dst_force() might clear the dst_entry attached to the skb.
> > > > The xfrm code don't expect this to happen, so we crash with
> > > > a NULL pointer dereference in this case. Fix it by checking
> > > > skb_dst(skb) for NULL after skb_dst_force() and drop the packet
> > > > in cast the dst_entry was cleared.
> > > > 
> > > > Fixes: 222d7dbd258d ("net: prevent dst uses after free")
> > > > Reported-by: Tobias Hommel <netdev-list@...oetigt.de>
> > > > Reported-by: Kristian Evensen <kristian.evensen@...il.com>
> > > > Reported-by: Wolfgang Walter <linux@...m.de>
> > > > Signed-off-by: Steffen Klassert <steffen.klassert@...unet.com>
> > > > ---
> > > 
> > > This patch fixes the problem here.
> > > 
> > > XfrmFwdHdrError gets around 80 at the very beginning and remains so. Probably 
> > > this happens when some route are changed/set then. 
> > > 
> > > Regards and thanks,
> > 
> > Same here, we're now running stable for ~6 hours, XfrmFwdHdrError is at 220.
> > This is less than 1 lost packet per minute, which seems to be okay for now.
> 
> Thanks a lot for testing! This is now applied to the ipsec tree.

After running for about 24 hours, I now encountered another panic. This time it
is caused by an out of memory situation. Although the trace shows action in the
filesystem code I'm posting it here because I cannot isolate the error and
maybe it is caused by our NULL pointer bug or by the new fix.
I do not have a serial console attached, so I could only attach a screenshot of
the panic to this mail.

I am running v4.19-rc3 from git with the above mentioned patch applied.
After 19 hours everything still looked fine, XfrmFwdHdrError value was at ~950.
Overall memory usage shown by htop was at 1.2G/15.6G.
I had htop running via ssh so I was able to see at least some status post
mortem. Uptime: 23:50:57
Overall memory usage was at 10.2G/15.6G and user processes were just
using the usual amount of memory, so it looks like the kernel was eating up at
least 9G of RAM.

Maybe this information is not very helpful for debugging, but it is at least a
warning that something might still be wrong.

I'll try to gather some more information and keep you updated.

Download attachment "oom_panic.png" of type "image/png" (56627 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ