linux-kernel - Re: [PATCH tip/core/rcu 03/13] rcu: Stop treating in-kernel CPU-bound workloads as errors

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20160225172019.GR3522@linux.vnet.ibm.com>
Date:	Thu, 25 Feb 2016 09:20:19 -0800
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	linux-kernel@...r.kernel.org, mingo@...nel.org,
	jiangshanlai@...il.com, dipankar@...ibm.com,
	akpm@...ux-foundation.org, mathieu.desnoyers@...icios.com,
	josh@...htriplett.org, tglx@...utronix.de, rostedt@...dmis.org,
	dhowells@...hat.com, edumazet@...gle.com, dvhart@...ux.intel.com,
	fweisbec@...il.com, oleg@...hat.com, bobby.prani@...il.com
Subject: Re: [PATCH tip/core/rcu 03/13] rcu: Stop treating in-kernel
 CPU-bound workloads as errors

On Thu, Feb 25, 2016 at 10:43:17AM +0100, Peter Zijlstra wrote:
> On Tue, Feb 23, 2016 at 09:12:40PM -0800, Paul E. McKenney wrote:
> > Commit 4a81e8328d379 ("Reduce overhead of cond_resched() checks for RCU")
> > handles the error case where a nohz_full loops indefinitely in the kernel
> > with the scheduling-clock interrupt disabled.  However, this handling
> > includes IPIing the CPU running the offending loop, which is not what
> > we want for real-time workloads.  And there are starting to be real-time
> > CPU-bound in-kernel workloads, and these must be handled without IPIing
> > the CPU, at least not in the common case.  Therefore, this situation can
> > no longer be dismissed as an error case.
> 
> Do explain. Doing "for (;;) ;" in a kernel RT thread is just as bad for
> general system health as is doing the same in userspace.

The use case is instead something like this:

	for (;;) {
		do_something();
		cond_resched_rcu_qs();
	}

If you instead do something like this:

	for (;;)
		do_something();

where do_something() doesn't invoke cond_resched_rcu_qs() often enough,
then your kernel is broken and the warrantee says that you get to keep
the pieces.

> Also, who runs his RT workload in-kernel ?

That would be me, actually.

I use something very much like this in rcutorture and in rcuperf (the
latter currently exists only in -rcu, although 0day has been helpfully
finding various problems with it).  In rcutorture, the problem never
arises given default kernel-boot-parameter settings.  However, you
could easily set various timing parameters to exceed the RCU CPU stall
warning timeout.

In rcuperf, this sort of thing happens by default under heavy load.

So why bother if the use case is this obscure?

Because I have been getting beaten up repeatedly over the past few years
about RCU sending IPIs, so I figured that this time I should at least
-try- to get ahead of the game!  ;-)

							Thanx, Paul