[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140903133640.GA25439@redhat.com>
Date: Wed, 3 Sep 2014 15:36:40 +0200
From: Oleg Nesterov <oleg@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Kautuk Consul <consul.kautuk@...il.com>,
Ingo Molnar <mingo@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Michal Hocko <mhocko@...e.cz>,
David Rientjes <rientjes@...gle.com>,
Ionut Alexa <ionut.m.alexa@...il.com>,
Guillaume Morin <guillaume@...infr.org>,
linux-kernel@...r.kernel.org, Kirill Tkhai <tkhai@...dex.ru>
Subject: Re: [PATCH 1/1] do_exit(): Solve possibility of BUG() due to race
with try_to_wake_up()
Peter, sorry for slow responses.
On 09/02, Peter Zijlstra wrote:
>
> On Tue, Sep 02, 2014 at 06:47:14PM +0200, Oleg Nesterov wrote:
>
> > But since I already wrote v2 yesterday, let me show it anyway. Perhaps
> > you will notice something wrong immediately...
> >
> > So, once again, this patch adds the ugly "goto" into schedule(). OTOH,
> > it removes the ugly spin_unlock_wait(pi_lock).
>
> But schedule() is called _far_ more often than exit(). It would be
> really good not to have to do that.
Yes sure, performance-wise this is not a win. My point was, this way the
whole "last schedule" logic becomes very simple.
But OK, I buy your nack. I understand that we should not penalize
__schedule() if possible. Let's forget this patch.
> > TASK_DEAD can die. The only valid user is schedule_debug(), trivial to
> > change. The usage of TASK_DEAD in task_numa_fault() is wrong in any case.
> >
> > In fact, I think that the next change can change exit_schedule() to use
> > PREEMPT_ACTIVE, and then we can simply remove the TASK_DEAD check in
> > schedule_debug().
>
> So you worry about concurrent wakeups vs setting TASK_DEAD and thereby
> loosing it, right?
>
> Would not something like:
>
> spin_lock_irq(¤t->pi_lock);
> __set_current_state(TASK_DEAD);
> spin_unlock_irq(¤t->pi_lock);
Sure. This should obviously fix the problem.
And, I think, another mb() after unlock_wait should fix it as well.
> Not be race free and similarly expensive to the smp_mb() we have there
> now?
Ah, I simply do not know what is cheaper, even on x86. Well, we need
to enable/disable irqs, but again I do not really know how much does
this cost. I can even say what (imo) looks better, lock/unlock above
or
// Ensure that the previous __set_current_state(RUNNING) can't
// leak after spin_unlock_wait()
smp_mb();
spin_unlock_wait();
// Another mb to ensure this too can't be reordered with unlock_wait
set_current_state(TASK_DEAD);
What do you think looks better?
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists