[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100622204433.GJ2290@linux.vnet.ibm.com>
Date: Tue, 22 Jun 2010 13:44:33 -0700
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org,
akpm@...ux-foundation.org, tglx@...utronix.de,
daniel.blueman@...il.com, lizf@...fujitsu.com,
miles.lane@...il.com, manfred@...orfullife.com
Subject: Re: [GIT PULL rcu/urgent] yet more lockdep-RCU splat fixes
On Thu, Jun 17, 2010 at 10:49:14AM +0200, Peter Zijlstra wrote:
> On Wed, 2010-06-16 at 15:41 -0700, Paul E. McKenney wrote:
>
> > Hello, Peter!
> >
> > Here is the story as I understand it:
> >
> > o wake_affine() calls task_group() and uses the resulting
> > pointer, for example, passing it to effective_load().
> >
> > This pointer is to a struct task_group, which contains
> > a struct rcu_head, which is passed to call_rcu in
> > sched_destroy_group(). So some protection really is
> > needed -- or is it enough that wake_affine seems to be
> > invoked on the current task? If the latter, we would
> > need to add a "task == current" check to task_subsys_state().
> >
> > o task_group() calls task_subsys_state(), returning a pointer to
> > the enclosing task_group structure.
> >
> > o task_subsys_state() returns an rcu_dereference_check()ed
> > pointer. The caller must either be in an RCU read-side
> > critical section, hold the ->alloc_lock, or hold the
> > cgroup lock.
> >
> > Now wake_affine() appears to be doing load calculations, so it does not
> > seem reasonable to acquire the lock. Hence the use of RCU.
> >
> > So, what should we be doing instead? ;-)
>
> Well, start by writing a sane changlog ;-)
As soon as I learn the relevant definition of "sane" for this context. ;-)
> I realise you didn't actually wrote these patches, but you should push
> back to the people feeding you these things (esp when you get gems like:
>
> tg = task_group();
> rcu_read_unlock();
>
> which is obvious utter garbage).
Agreed. If you prefer, I can combine the two patches to avoid the
appearance of insanity. (The second patch of the pair adjusts the
rcu_read_unlock() to cover all uses of the "tg" pointer.)
> There's _two_ task_group() users in wake_affine(), at least one should
> be covered by the rq->lock we're holding. It should then explain why the
> other isn't covered (and which the other is).
I am probably missing something, but I see wake_affine() only called
from select_task_rq_fair(), which is one of the possible values for
->select_task_rq(). This can be called from select_task_rq(), which
claims that it can be called without holding rq->lock. I do not see
any rq->lock acquisition on the path from select_task_rq() to the
call to wake_affine().
(I am looking at 2.6.34, FWIW.)
> It should also explain why using RCU read lock is the right solution,
> and doesn't result in funny races. That is, the current changelog reads
> like: "It whines, this makes it quiet." -- which I totally distrust
> because we already found at least two actual bugs in this area
> (sched-cgroup rcu usage).
The usage appears to be heuristic in nature, so that processing old
data should be non-fatal.
> That said, the two patches together might not be wrong, but its very
> hard to verify without more information.
Left to myself, I would combine the two patches and use the changelog
shown below. Does this work for you?
Thanx, Paul
rcu: apply RCU protection to wake_affine()
The task_group() function returns a pointer that must be protected
by either RCU, the ->alloc_lock, or the cgroup lock (see the
rcu_dereference_check() in task_subsys_state(), which is invoked by
task_group()). The wake_affine() function currently does none of these,
which means that a concurrent update would be within its rights to free
the structure returned by task_group(). Because wake_affine() uses this
structure only to compute load-balancing heuristics, there is no reason
to acquire either of the two locks.
Therefore, this commit introduces an RCU read-side critical section that
starts before the first call to task_group() and ends after the last use
of the "tg" pointer returned from task_group(). Thanks to Li Zefan for
pointing out the need to extend the RCU read-side critical section from
that proposed by the original patch.
Signed-off-by: Daniel J Blueman <daniel.blueman@...il.com>
Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists