linux-kernel - Re: [RFC, PATCH, -rt] Early prototype RCU priority-boost patch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20060731143850.GA1293@us.ibm.com>
Date:	Mon, 31 Jul 2006 07:38:50 -0700
From:	"Paul E. McKenney" <paulmck@...ibm.com>
To:	Bill Huey <billh@...ppy.monkey.org>
Cc:	Esben Nielsen <nielsen.esben@...glemail.com>,
	linux-kernel@...r.kernel.org, tglx@...utronix.de,
	rostedt@...dmis.org, dipankar@...ibm.com, mingo@...e.hu,
	tytso@...ibm.com, dvhltc@...ibm.com
Subject: Re: [RFC, PATCH, -rt] Early prototype RCU priority-boost patch

On Fri, Jul 28, 2006 at 07:50:37PM -0700, Bill Huey wrote:
> On Fri, Jul 28, 2006 at 07:18:29PM -0700, Paul E. McKenney wrote:
> > On Fri, Jul 28, 2006 at 03:27:16PM -0700, Bill Huey wrote:
> > > What is that ? like randomly boosting without tracking which thread is
> > > inside an RCU critical section ?
> > 
> > Perhaps a better way to put it would be that a thread preempted in
> > an RCU read-side critical section boosts itself, and tracks the fact
> > that it boosted itself in its tasks structure.
> > 
> > The second boost would be from some other task, but if the task had 
> > already boosted itself, the de-boosting would already be taken care of
> > at the next rcu_read_unlock() -- but as mentioned earlier in this
> > thread, you only boost someone else if they are not currently running.
> 
> The problem here is that I can't see how it's going to boost the thread
> if the things doing the RCU sync can't track the list of readers. It
> might be record in the trask struct, now what ?

The first boost is performed by the task itself the first time there
is a preemption attempt (or the first time it blocks on a mutex), so
no need to track the list of readers in that case.  The trick is that
there is no benefit to boosting someone who is already running -- we
only need to boost the first time they are considering blocking.

If there is a need for a second "boost to the sky" in case of excessively
delayed grace period (or to provide deterministic synchronize_rcu()
latency), then we need a list only of those RCU readers who have attempted
to block at least once thus far in their current RCU read-side critical
section.  But I was putting this off until I get the simple case right.
Cowardly of me, I know!  ;-)

Finally found Steve Rostedt's PI document (in 2.6.18-rc2), very helpful
(though I suppose I should reserve judgement until after I get this
working...)

> > > > 2.	RCU reader boosting a lock holder.  This ends up being a
> > > > 	combination of #1 (because the act of blocking on a lock implies
> > > > 	an "out of nowhere" priority boost) and normal lock boosting.
> > > 
> > > Lock holder as in mutex held below and RCU critical section ?
> > 
> > Lock holder as in task 0 holds the lock, perhaps in an RCU read-side
> > critical section and perhaps not.  Task 1 is in an RCU read-side
> > critical section and attempts to acquire the lock.  Task 1 must block,
> > because task 0 still holds the lock.  Task 1 must boost itself before
> > blocking, and must donate its boosted priority to task 0.
> 
> Ok (thinking...)
> 
> > > > 3.	A call_rcu() or synchronize_rcu() boosting all readers.  I am
> > > > 	not sure we really need this, but in case we do...  One would
> > > > 	need an additional prio_booster for each task to be boosted,
> > > > 	right?  This would seem to require an additional prio_booster
> > > > 	struct in each task structure.
> > >  
> > > This needs a notion of RCU read side ownership to boost those preempted
> > > threads.
> > 
> > I am getting the impression that #3 is something to leave aside for now.
> 
> ... 
> 
> > The idea is that none of this stuff ever happens except in cases where
> > the RCU read-side critical section blocks, in which case all this is
> 
> ...
> 
> > in the noise compared to the context switch.  The sole exception to
> > this is that rcu_read_unlock() must check to see if it has been boosted,
> > and deboost itself if so.  I don't particularly like the additional
> > comparison, but it should not be too expensive.
> 
> Oh, blocks as is gets shoved into a wait queue for a PI enabled lock.

Yep.

> > > Don't know what to think about it other than some kind of tracking or
> > > boosting logic in the per CPU run queue or the task struct itself during
> > > the boost operation. But you're still stuck with the problem of what
> > > to boost and how to find that out during an RCU sync side. It's still
> > > an ownership problem unless Esben can think of another way of getting
> > > around that problem.
> > 
> > One idea is to put tasks that block in RCU read-side critical sections
> > on a list -- again, the hope is that the overhead is in the noise compared
> > to the context switch.
> 
> Only way to find out is to try it.

Or to try without it and see what happens.

> > > That's why I suggested a priority ceiling or per CPU priority threshold
> > > tracking (+ CPU binding) the priority of the irq-threads and stuff. It's
> > > a simple hack to restore the cheesy preempt count stuff without having
> > > to revert to invasive ownership tracking for each reader.
> > > 
> > > It's just an idea. Maybe it'll be useful to you.
> > 
> > Let me make sure I understand what you are suggesting -- sounds to me
> > like a check in preempt_schedule().  If the task to be preempted is
> > higher priority than the ceiling, the preemption request is refused.
> > 
> > Or am I missing part of your proposal?
> 
> Something like at all preemption points, cond_resched() and friends (scheduler
> tick) to an additional check against a value in a threads own CPU run queue
> struct to see if it should permit the preemption or not. I'm thinking about
> ways to avoid doing an expensive run queue lock during a task's priority
> manipulation and instead have some other kind of logic orthogonal to that so
> that it can bypass this overhead.  A value in a run queue that can be checked
> against in order to prevent a preemption from happen might be able to side
> step the need for a doing a full run queue lock to reorder a tasks priority
> ranking.
> 
> If a ceiling or threshold was defined for RCU (another somewhat complicated
> topic) it could prevent the RCU critical section from preempting other
> than with other SCHED_FIFO tasks at and above that priority, if you choose
> a threshold at that priority. That'll be apart of the runtime configuratio
> of the system. You'd have to cpu_get/put to get that value so that you get
> at it safely, read or write to it, and maybe save and restore that value on
> entry and exit respectively. You'll also have to set a field in the task
> struct to prevent it from migration to another CPU and make sure that's
> modifying the right stuff on the right CPU.
> 
> It's a possible solution to a rather difficult problem. What do you think ?
> too much of a hack ?

I am not sure -- seems to be a dual approach to boosting the RCU reader's
priority in the preemption case.  I suspect that a real priority boost
would still be needed in the case where the RCU reader blocks on a mutex,
since we need the priority inheritance to happen in that case, right?

> (I'm into -rt development again after a good OLS and I'm trying to get my
> kernel development up and going so that I can help out)

Sounds very good!!!

							Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/