netdev - Re: Occasional oops with IPSec and IPv6.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4EC6A38E.6060404@iki.fi>
Date:	Fri, 18 Nov 2011 20:27:26 +0200
From:	Timo Teräs <timo.teras@....fi>
To:	Eric Dumazet <eric.dumazet@...il.com>
CC:	Nick Bowler <nbowler@...iptictech.com>, netdev@...r.kernel.org,
	"David S. Miller" <davem@...emloft.net>
Subject: Re: Occasional oops with IPSec and IPv6.

On 11/18/2011 06:39 PM, Eric Dumazet wrote:
> Le vendredi 18 novembre 2011 à 11:27 -0500, Nick Bowler a écrit :
>> On 2011-11-17 14:09 -0500, Nick Bowler wrote:
>>> One of the tests we do with IPsec involves sending and receiving UDP
>>> datagrams of all sizes from 1 to N bytes, where N is much larger than
>>> the MTU.  In this particular instance, the MTU is 1500 bytes and N is
>>> 10000 bytes.  This test works fine with IPv4, but I'm getting an
>>> occasional oops on Linus' master with IPv6 (output at end of email).  We
>>> also run the same test where N is less than the MTU, and it does not
>>> trigger this issue.  The resulting fallout seems to eventually lock up
>>> the box (although it continues to work for a little while afterwards).
>>>
>>> The issue appears timing related, and it doesn't always occur.  This
>>> probably also explains why I've not seen this issue before now, as we
>>> recently upgraded all our lab systems to machines from this century
>>> (with newfangled dual core processors).  This also makes it somewhat
>>> hard to reproduce, but I can trigger it pretty reliably by running 'yes'
>>> in an ssh session (which doesn't use IPsec) while running the test:
>>> it'll usually trigger in 2 or 3 runs.  The choice of cipher suite
>>> appears to be irrelevant.
>>>
>>> I built a relatively old kernel (2.6.34) and could not reproduce the
>>> issue there, so I ran a git bisect.  It pointed to the following, which
>>> (unsurprisingly) no longer reverts cleanly.
>>>
>>> Let me know if you need any more info.  I'll see if I can reproduce the
>>> issue with a smaller test case...
>>
>> OK, here's a somewhat straigthforward way to reproduce it that I've
>> found.  It uses a short test program called "udp_burst" which simply
>> transmits a bunch of UDP datagrams at all sizes between 1 and 10000,
>> included at the end of this mail.
>>[snip]
> 
> Please note commit 80c802f307 added a known bug, fixed in commit
> 0b150932197b (xfrm: avoid possible oopse in xfrm_alloc_dst)
> 
> Given commit 80c802f307 complexity, we can assume other bugs are to be
> fixed as well.
> 
> Unfortunately, Timo seems unresponsive.

This looks quite different. And I've been trying to figure out what
causes this. However, the OOPS happens at ip6_fragment(), indicating
that there was not enough allocated headroom (skb underrun). My initial
thought is ipv6 bug that just got uncovered by my commit; especially
since ipv4 side is happy. But I haven't yet been able to figure this one
out.

Could you also try Herbert's latest patch set:
  [0/6] Replace LL_ALLOCATED_SPACE to allow needed_headroom adjustment

This changes how the headroom is calculated, and *might* fix this issue
too if it's caused by the same SMP race condition which got uncovered by
my other commit earlier.

- Timo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html