[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120106141258.GB19462@redhat.com>
Date: Fri, 6 Jan 2012 15:12:58 +0100
From: Oleg Nesterov <oleg@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Yasunori Goto <y-goto@...fujitsu.com>, Ingo Molnar <mingo@...e.hu>,
Hiroyuki KAMEZAWA <kamezawa.hiroyu@...fujitsu.com>,
Motohiro Kosaki <kosaki.motohiro@...fujitsu.com>,
Linux Kernel ML <linux-kernel@...r.kernel.org>
Subject: Re: [BUG] TASK_DEAD task is able to be woken up in special
condition
On 01/06, Peter Zijlstra wrote:
>
> On Fri, 2012-01-06 at 21:01 +0900, Yasunori Goto wrote:
>
> > Do you mean the following patch?
>
> Yes, something like that. At that point ->state should be TASK_RUNNING
> (since we are after all running). The unlock_wait() will synchronize
> against any in-progress ttwu() while its fast path is a non-atomic
> compare. Any ttwu after this will bail since it will either observe
> TASK_RUNNING or TASK_DEAD, neither are a state it will act upon.
>
> Now the only question that remains is if we need the full memory barrier
> or if we can get away with less.
>
> I guess the mb separates the write to ->state (setting TASK_RUNNING)
> from the read of ->pi_lock. The remote CPU must see the TASK_RUNNING,
> and we must see ->pi_lock taken if it is.
Yes, I think we need the full mb, STORE vs LOAD.
> > --- linux-3.2-rc7.orig/kernel/exit.c
> > +++ linux-3.2-rc7/kernel/exit.c
> > @@ -1038,6 +1038,10 @@ NORET_TYPE void do_exit(long code)
> >
> > preempt_disable();
> > exit_rcu();
> > +
> > + smp_mb();
> > + raw_spin_unlock_wait(&tsk->pi_lock);
> > +
> > /* causes final put_task_struct in finish_task_switch(). */
> > tsk->state = TASK_DEAD;
Interesting. Initially I thought this is wrong and we should do
raw_spin_unlock_wait(pi_lock);
mb();
tsk->state = TASK_DEAD;
This "obviously" serializes LOAD(pi_lock) and STORE(state).
But when I re-read your explanation above I think you are right,
mb() before unlock_wait() should work too, just it refers to
state = RUNNING in the past.
But this makes me worry. We are doing a lot of things after
exit_mm(). In particular we take tasklist_lock in exit_notify()
and then do_exit() takes task_lock(). But every unlock + lock
implies mb(). So how it was possible to hit this bug???
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists