linux-kernel - Re: [PATCH tip/core/rcu 11/20] sched,rcu: Make cond

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20170118021006.GK5238@linux.vnet.ibm.com>
Date:   Tue, 17 Jan 2017 18:10:06 -0800
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org, mingo@...nel.org,
        jiangshanlai@...il.com, dipankar@...ibm.com,
        akpm@...ux-foundation.org, mathieu.desnoyers@...icios.com,
        josh@...htriplett.org, tglx@...utronix.de, rostedt@...dmis.org,
        dhowells@...hat.com, edumazet@...gle.com, dvhart@...ux.intel.com,
        fweisbec@...il.com, oleg@...hat.com, bobby.prani@...il.com
Subject: Re: [PATCH tip/core/rcu 11/20] sched,rcu: Make cond_resched()
 provide RCU quiescent state

On Tue, Jan 17, 2017 at 01:11:46PM +0100, Michal Hocko wrote:
> On Tue 17-01-17 04:05:13, Paul E. McKenney wrote:
> > On Tue, Jan 17, 2017 at 11:51:41AM +0100, Michal Hocko wrote:
> > > On Mon 16-01-17 16:54:03, Paul E. McKenney wrote:
> > > > On Mon, Jan 16, 2017 at 06:11:30PM +0100, Peter Zijlstra wrote:
> > > > > On Sat, Jan 14, 2017 at 01:13:12AM -0800, Paul E. McKenney wrote:
> > > > > > There is some confusion as to which of cond_resched() or
> > > > > > cond_resched_rcu_qs() should be added to long in-kernel loops.
> > > > > > This commit therefore eliminates the decision by adding RCU
> > > > > > quiescent states to cond_resched().
> > > > > 
> > > > > Which would make: rcu_read_lock(); cond_resched(); rcu_read_unlock();
> > > > > invalid under preemptible RCU. Is it already?
> > > > 
> > > > In theory, yes.  In practice, I just tested it with preemption and
> > > > lockdep enabled, and it didn't complain.  If further testing finds
> > > > complaints, we can either fix those uses (preferred) or revert
> > > > this patch.
> > > > 
> > > > > > Warning: This is a prototype.  For example, it does not correctly
> > > > > > handle Tasks RCU.  Which is OK for the moment, given that no one
> > > > > > actually uses Tasks RCU yet.
> > > > > 
> > > > > > --- a/kernel/sched/core.c
> > > > > > +++ b/kernel/sched/core.c
> > > > > > @@ -4907,6 +4907,7 @@ int __sched _cond_resched(void)
> > > > > >  		preempt_schedule_common();
> > > > > >  		return 1;
> > > > > >  	}
> > > > > > +	rcu_all_qs();
> > > > > >  	return 0;
> > > > > >  }
> > > > > 
> > > > > Still not a real fan of this, it does make cond_resched() touch a bunch
> > > > > more cachelines, also, I suppose that if we're going to do this, we
> > > > > should make __cond_resched_lock() and __cond_resched_softirq() act
> > > > > similarly.
> > > > 
> > > > Michal (now CCed) argues that having to distinguish between cond_resched()
> > > > and cond_resched_rcu_qs() is overly burdensome.  Michal?
> > > 
> > > Yes, it is really not clear which one is meant to be in which context. I
> > > really do not see which cond_resched should be turned intto
> > > cond_resched_rcu_qs.
> > > 
> > > > Any thoughts on how we might remove this burden without the additional
> > > > cache misses?  I will take another look as well to see what could make
> > > > it lower cost.  There are probably ways...  Would it make sense to
> > > > have RCU maintain a need-rcu_all_qs() flage in the same cacheline as
> > > > the __preempt_count?  Perhaps throttling the writes to this flag from
> > > > the RCU grace-period kthreads to once per 100 milliseconds or so?
> > > 
> > > Can the stall detector simply request rescheduling when it gets
> > > dangerously close to the timeout?
> > 
> > It is quite possible that half of the stall timeout would be a better
> > choice than my 100 milliseconds, but either way, there would be need
> > for a flag or some such.
> 
> E.g. set_tsk_need_resched() on the task currently running on a cpu which
> is preventing the rcu grace period for too long?
> 
> That would only require change to the stall detector and the cond_resched
> could be left alone completely.

Thank you!!!

The other complication is that under CONFIG_PREEMPT=y, _cond_resched()
is an empty function.  That would be one reason why use of cond_resched()
wasn't always giving RCU the quiescent states that it needs.  And that
is a problem with this patch, which I therefore need to defer to 4.12.

That aside, the reason I am reluctant to use the need-resched approach
except as an emergency measure is that the way I have to set that bit
remotely involves IPIs.

But don't get me wrong, it is extremely useful as an emergency meaure.
I am just trying to get cond_resched() to help on a non-emergency basis.

							Thanx, Paul