lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140619041900.GD4669@linux.vnet.ibm.com>
Date:	Wed, 18 Jun 2014 21:19:00 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Andi Kleen <ak@...ux.intel.com>
Cc:	Dave Hansen <dave.hansen@...el.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Josh Triplett <josh@...htriplett.org>,
	"Chen, Tim C" <tim.c.chen@...el.com>,
	Christoph Lameter <cl@...ux.com>
Subject: Re: [bisected] pre-3.16 regression on open() scalability

On Wed, Jun 18, 2014 at 08:38:16PM -0700, Andi Kleen wrote:
> On Wed, Jun 18, 2014 at 07:13:37PM -0700, Paul E. McKenney wrote:
> > On Wed, Jun 18, 2014 at 06:42:00PM -0700, Andi Kleen wrote:
> > > 
> > > I still think it's totally the wrong direction to pollute so 
> > > many fast paths with this obscure debugging check workaround
> > > unconditionally.
> > 
> > OOM prevention should count for something, I would hope.
> 
> OOM in what scenario? This is getting bizarre.

On the bizarre part, at least we agree on something.  ;-)

CONFIG_NO_HZ_FULL booted with at least one nohz_full CPU.  Said CPU
gets into the kernel and stays there, not necessarily generating RCU
callbacks.  The other CPUs are very likely generating RCU callbacks.
Because the nohz_full CPU is in the kernel, and because there are no
scheduling-clock interrupts on that CPU, grace periods do not complete.
Eventually, the callbacks from the other CPUs (and perhaps also some
from the nohz_full CPU, for that matter) OOM the machine.

Now this scenario constitutes an abuse of CONFIG_NO_HZ_FULL, because it
is intended for CPUs that execute either in userspace (in which case
those CPUs are in extended quiescent states so that RCU can happily
ignore them) or for real-time workloads with low CPU untilization (in
which case RCU sees them go idle, which is also a quiescent state).
But that won't stop people from abusing their kernels and complaining
when things break.

This same thing can also happen without CONFIG_NO_HZ full, though
the system has to work a bit harder.  In this case, the CPU looping
in the kernel has scheduling-clock interrupts, but if all it does
is cond_resched(), RCU is never informed of any quiescent states.
The whole point of this patch is to make those cond_resched() calls,
which are quiescent states, visible to RCU.

> If something keeps looping forever in the kernel creating 
> RCU callbacks without any real quiescent states it's simply broken.

I could get behind that.  But by that definition, there is a lot of
breakage in the current kernel, especially as we move to larger CPU
counts.

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ