lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 8 Jul 2014 00:01:01 -0700 (PDT)
From:	dormando <dormando@...ia.net>
To:	Eric Dumazet <eric.dumazet@...il.com>
cc:	Alexey Preobrazhensky <preobr@...gle.com>,
	Steffen Klassert <steffen.klassert@...unet.com>,
	David Miller <davem@...emloft.net>, paulmck@...ux.vnet.ibm.com,
	netdev@...r.kernel.org, Kostya Serebryany <kcc@...gle.com>,
	Dmitry Vyukov <dvyukov@...gle.com>,
	Lars Bull <larsbull@...gle.com>,
	Eric Dumazet <edumazet@...gle.com>,
	Bruce Curtis <brutus@...gle.com>,
	Maciej Żenczykowski <maze@...gle.com>,
	Alexei Starovoitov <alexei.starovoitov@...il.com>
Subject: Re: [PATCH] ipv4: fix a race in ip4_datagram_release_cb()

On Tue, 8 Jul 2014, Eric Dumazet wrote:

> On Mon, 2014-07-07 at 18:41 -0700, dormando wrote:
>
> > Mostly there, but I think we hit what might be a new bug.. The machines
> > which crashed every few days previously have been stable for weeks.
> >
> > however I had one machine running the new kernel in a larger cluster
> > elsewhere; we had a network event and the one machine on the new kernel
> > panic'ed in ipv4_dst_destroy, but what looks like a new path. Sadly I've
> > had to halt the rollout :( All of the older unfixed kernels survived this
> > particular network event.
> >
> > Unfortunately this is still on 3.10, due to a bad softirq regression in
> > 3.14 I've not had time to track down. I applied all of your patches for
> > what wasn't already in 3.10. The only other change I made was to un-revert
> > 62713c4b6bc10c2d082ee1540e11b01a2b2162ab - which I'd been keeping reverted
> > as it was making crashes much more frequent.
>
> Hmm, always give patch title or a valid sha1 commit, this one is not in
> David trees, so its hard to tell.
>

Damn, sorry. I thought it was valid:
Author: Alexei Starovoitov <ast@...mgrid.com>
Date:   Tue Nov 19 19:12:34 2013 -0800

    ipv4: fix race in concurrent ip_route_input_slow()

    [ Upstream commit dcdfdf56b4a6c9437fc37dbc9cee94a788f9b0c4 ]

It's a thing that uses a DST_NOCACHE flag. I can re-add the reversion to
my own tree, but it should probably be reviewed again I guess?

We had another thread about it a while ago. I'd upgraded between stable
revisions of 3.10 (when this patch was added) and machines in one
datacenter started crashing every few hours. Thread never went anywhere.

Tried removing the reversion since your recent patches should've fixed the
underlying problem.

I have no idea if this patch is the problem or not though, just adding the
information for completeness. We had no luck at all reproducing this
latest crash.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ