lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 23 Jun 2014 09:57:59 -0700
From:	Dmitry Vyukov <dvyukov@...gle.com>
To:	dormando <dormando@...ia.net>
Cc:	Eric Dumazet <eric.dumazet@...il.com>,
	Alexey Preobrazhensky <preobr@...gle.com>,
	Steffen Klassert <steffen.klassert@...unet.com>,
	David Miller <davem@...emloft.net>, paulmck@...ux.vnet.ibm.com,
	netdev@...r.kernel.org, Kostya Serebryany <kcc@...gle.com>,
	Lars Bull <larsbull@...gle.com>,
	Eric Dumazet <edumazet@...gle.com>,
	Bruce Curtis <brutus@...gle.com>,
	Maciej Żenczykowski <maze@...gle.com>,
	Alexei Starovoitov <alexei.starovoitov@...il.com>
Subject: Re: [PATCH] ipv4: fix a race in ip4_datagram_release_cb()

On Mon, Jun 23, 2014 at 1:55 AM, dormando <dormando@...ia.net> wrote:
> On Mon, 23 Jun 2014, Eric Dumazet wrote:
>
>> On Sun, 2014-06-22 at 12:07 -0700, dormando wrote:
>>
>> > Update on testing:
>> >
>> > I only have two machines that crash on their own frequently (more like
>> > one, even). Unfortunately something happened to the datacenter it's in and
>> > it was offline for a week. The machine normally crashes after 1.5-4d,
>> > averaging 2d.
>> >
>> > It's done about three days total time without a new crash. I also have the
>> > kernel running in another datacenter for ~10 days.. but it takes 30-150
>> > days to crash in that one.
>> >
>> > So, inconclusive, but still promising. If the machine survives the week it
>> > probably means it's fixed, or at least greatly reduced.
>> >
>> > I saw that one of your patches got queued for stable, but all three were
>> > necessary to fix udpkill. What's your plan for cleanup/upstreaming?
>> >
>> > Did you folks end up running udpkill under the tester thing?
>>
>> I did not test udpkill, as the known problem is the DST_NOCACHE flag.
>>
>> We end up calling sk_dst_set(sk, dst) with a dst having this flag set.
>>
>> So maybe DST_NOCACHE should be renamed, if we _can _ cache a dst like
>> this. Its meaning is really that dst_release() has to track when
>> refcount reaches 0 so that last owner fress dst, but we need to respect
>> rcu grace period.
>>
>> Fixing sk_dst_set() as I did is not enough, as it is only reducing race
>> window.
>
> Hrm. I'll have to spend more time trying to understand how to test this
> (beyond just putting a kernel into production and seeing if it crashes).
> Outside of udpkill it's been slightly hard to reproduce, and udpkill ran
> for over five hours with your previous patches.
>
> Do you or other folks have any methods for testing this?


Well, running with kasan should reduce time to crash by an order of magnitude.
Alexey, have we tried running udpkill with kasan?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ