lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100416164503.GH2615@linux.vnet.ibm.com>
Date:	Fri, 16 Apr 2010 09:45:03 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Avi Kivity <avi@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Rik van Riel <riel@...hat.com>, Ingo Molnar <mingo@...e.hu>,
	akpm@...ux-foundation.org,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
	David Miller <davem@...emloft.net>,
	Hugh Dickins <hugh.dickins@...cali.co.uk>,
	Mel Gorman <mel@....ul.ie>, Nick Piggin <npiggin@...e.de>
Subject: Re: [PATCH 01/13] powerpc: Add rcu_read_lock() to gup_fast()
 implementation

On Fri, Apr 16, 2010 at 05:14:15PM +0200, Peter Zijlstra wrote:
> On Fri, 2010-04-16 at 08:09 -0700, Paul E. McKenney wrote:
> > On Fri, Apr 16, 2010 at 04:56:50PM +0200, Peter Zijlstra wrote:
> > > On Fri, 2010-04-16 at 07:32 -0700, Paul E. McKenney wrote:
> > > > On Fri, Apr 16, 2010 at 04:23:39PM +0200, Peter Zijlstra wrote:
> > > > > On Fri, 2010-04-16 at 07:17 -0700, Paul E. McKenney wrote:
> > > > > 
> > > > > > > > Of course, with call_rcu_sched(), the corresponding RCU read-side critical
> > > > > > > > sections are non-preemptible.  Therefore, in CONFIG_PREEMPT_RT, these
> > > > > > > > read-side critical sections must use raw spinlocks.
> > > > > > > 
> > > > > > > OK, so if we fully remove CONFIG_TREE_PREEMPT_RCU (defaulting to y),
> > > > > > > rename all the {call_rcu, rcu_read_lock, rcu_read_unlock,
> > > > > > > synchronize_rcu} functions to {*}_preempt and then add a new
> > > > > > > CONFIG_PREEMPT_RCU that simply maps {*} to either {*}_sched or
> > > > > > > {*}_preempt, we've basically got what I've been asking for for a while,
> > > > > > > no?
> > > > > > 
> > > > > > What would rcu_read_lock_preempt() do in a !CONFIG_PREEMPT kernel?
> > > > > 
> > > > > Same as for a preempt one, since you'd have to be able to schedule()
> > > > > while holding it to be able to do things like mutex_lock().
> > > > 
> > > > So what you really want is something like rcu_read_lock_sleep() rather
> > > > than rcu_read_lock_preempt(), right?  The point is that you want to do
> > > > more than merely preempt, given that it is legal to do general blocking
> > > > while holding a mutex, correct?
> > > 
> > > Right, but CONFIG_TREE_PREEMPT_RCU=y ends up being that. We could change
> > > the name to _sleep, but we've been calling it preemptible-rcu for a long
> > > while now.
> > 
> > It is actually not permitted to do general blocking in a preemptible RCU
> > read-side critical section.  Otherwise, someone is going to block waiting
> > for a network packet that never comes, thus OOMing the system.
> 
> Sure, something that guarantees progress seems like a sensible
> restriction for any lock, and in particular RCU :-)

Excellent point -- much of the issue really does center around
forward-progress guarantees.  In fact the Linux kernel has a number of
locking primitives that require different degrees of forward-progress
guarantee from the code in their respective critical sections:

o	spin_lock_irqsave(): Critical sections must guarantee forward
	progress against everything except NMI handlers.

o	raw_spin_lock(): Critical sections must guarantee forward
	progress against everything except IRQ (including softirq)
	and NMI handlers.

o	spin_lock(): Critical sections must guarantee forward
	progress against everything except IRQ (again including softirq)
	and NMI handlers and (given CONFIG_PREEMPT_RT) higher-priority
	realtime tasks.

o	mutex_lock(): Critical sections need not guarantee
	forward progress, as general blocking is permitted.

The other issue is the scope of the lock.  The Linux kernel has
the following:

o	BKL: global scope.

o	Everything else: scope defined by the use of the underlying
	lock variable.

One of the many reasons that we are trying to get rid of BKL is because
it combines global scope with relatively weak forward-progress guarantees.

So here is how the various RCU flavors stack up:

o	rcu_read_lock_bh(): critical sections must guarantee forward
	progress against everything except NMI handlers and IRQ handlers,
	but not against softirq handlers.  Global in scope, so that
	violating the forward-progress guarantee risks OOMing the system.

o	rcu_read_lock_sched(): critical sections must guarantee
	forward progress against everything except NMI and IRQ handlers,
	including softirq handlers.  Global in scope, so that violating
	the forward-progress guarantee risks OOMing the system.

o	rcu_read_lock(): critical sections must guarantee forward
	progress against everything except NMI handlers, IRQ handlers,
	softirq handlers, and (in CONFIG_PREEMPT_RT) higher-priority
	realtime tasks.  Global in scope, so that violating the
	forward-progress guarantee risks OOMing the system.

o	srcu_read_lock(): critical sections need not guarantee forward
	progress, as general blocking is permitted.  Scope is controlled
	by the use of the underlying srcu_struct structure.

As you say, one can block in rcu_read_lock() critical sections, but
the only blocking that is really safe is blocking that is subject to
priority inheritance.  This prohibits mutexes, because although the
mutexes themselves are subject to priority inheritance, the mutexes'
critical sections might well not be.

So the easy response is "just use SRCU."  Of course, SRCU has some
disadvantages at the moment:

o	The return value from srcu_read_lock() must be passed to
	srcu_read_unlock().  I believe that I can fix this.

o	There is no call_srcu().  I believe that I can fix this.

o	SRCU uses a flat per-CPU counter scheme that is not particularly
	scalable.  I believe that I can fix this.

o	SRCU's current implementation makes it almost impossible to
	implement priority boosting.  I believe that I can fix this.

o	SRCU requires explicit initialization of the underlying
	srcu_struct.  Unfortunately, I don't see a reasonable way
	around this.  Not yet, anyway.

So, is there anything else that you don't like about SRCU?

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ