lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 09 Feb 2016 16:31:25 +0100
From:	Mike Galbraith <umgwanakikbuti@...il.com>
To:	Tejun Heo <tj@...nel.org>
Cc:	Michal Hocko <mhocko@...nel.org>, Jiri Slaby <jslaby@...e.cz>,
	Thomas Gleixner <tglx@...utronix.de>,
	Petr Mladek <pmladek@...e.com>, Jan Kara <jack@...e.cz>,
	Ben Hutchings <ben@...adent.org.uk>,
	Sasha Levin <sasha.levin@...cle.com>, Shaohua Li <shli@...com>,
	LKML <linux-kernel@...r.kernel.org>, stable@...r.kernel.org,
	Daniel Bilik <daniel.bilik@...system.cz>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: Crashes with 874bbfe600a6 in 3.18.25

On Fri, 2016-02-05 at 16:06 -0500, Tejun Heo wrote:
> On Fri, Feb 05, 2016 at 09:59:49PM +0100, Mike Galbraith wrote:
> > On Fri, 2016-02-05 at 15:54 -0500, Tejun Heo wrote:
> > 
> > > What are you suggesting?
> > 
> > That 874bbfe6 should die.
> 
> Yeah, it's gonna be killed.  The commit is there because the behavior
> change broke things.  We don't want to guarantee it but have been and
> can't change it right away just because we don't like it when things
> may break from it.  The plan is to implement a debug option to force
> workqueue to always execute these work items on a foreign cpu to weed
> out breakages.

A niggling question remaining is when is it gonna be killed?

1. Meanwhile, 874bbfe6 was sent to 2.6.31+, meaning that every stable
tree where it landed which did not ALSO receive 22b886dd has become
destabilized.  We have two 3.12-stability reports, one the hotplug
explosion that you provided a workaround for, one the corruption, and
one corruption report for 3.18.  Both breakage types would be sort of
fixed up by getting 22b886dd and your hotplug workaround (which does
_not_ guarantee survival) were applied everywhere, however...

2. We also have a report for the 3.18 corruption victim that adding
22b886dd did NOT restore the stable status quo, rather it replaced the
corruption that 874bbfe6 caused with a performance regression.

3. 874bbfe6 + 22b886dd also inflicts a NO_HZ_FULL regression. 
 Admittedly not a huge deal, but another regression nonetheless.

The only evidence I've seen that anything at all was the broken by the
changes that triggered the inception of 874bbfe6 in the first place was
the b0rked vmstat thing that Linus had already fixed with 176bed1d.  So
where is the breakage you mention that makes keeping 874bbfe6 the
prudent thing to do vs just reverting 874bbfe6 immediately, perhaps
22b886dd as well given it is fallout thereof, and getting that sent off
to stable?

It looks for all the world as if the sole excuse for either to exist is
to prevent any other stupid mistakes like the vmstat thing from being
exposed for what they are by actively hiding them, when in fact, that
hiding doesn't survive a hotplug event (as we saw in the crash analysis
I showed you).  Surely there's a better reason to keep that commit than
hiding bugs that can only remain hidden until they meet hotplug.  What
is it?

	-Mike

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ