[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20140729155309.GA30194@redhat.com>
Date: Tue, 29 Jul 2014 17:53:09 +0200
From: Oleg Nesterov <oleg@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Sasha Levin <sasha.levin@...cle.com>,
Ingo Molnar <mingo@...nel.org>,
John Stultz <john.stultz@...aro.org>,
Thomas Gleixner <tglx@...utronix.de>,
Frederic Weisbecker <fweisbec@...il.com>,
LKML <linux-kernel@...r.kernel.org>,
Dave Jones <davej@...hat.com>,
Andrey Ryabinin <a.ryabinin@...sung.com>
Subject: Re: finish_task_switch && prev_state (Was: sched, timers: use
after free in __lock_task_sighand when exiting a process)
On 07/29, Peter Zijlstra wrote:
>
> On Tue, Jul 29, 2014 at 11:10:18AM +0200, Peter Zijlstra wrote:
> > On Tue, Jul 15, 2014 at 04:25:25PM +0200, Oleg Nesterov wrote:
> >
> > > And probably I missed something again, but it seems that this logic is broken
> > > with __ARCH_WANT_UNLOCKED_CTXSW.
> > >
> > > Of course, even if I am right this is pure theoretical, but smp_wmb() before
> > > "->on_cpu = 0" is not enough and we need a full barrier ?
> >
> > (long delay there, forgot about this thread, sorry)
> >
> > Yes, I think I see that.. but now I think the comment is further wrong.
> >
> > Its not rq->lock that is important, remember, a concurrent wakeup onto
> > another CPU does not require our rq->lock at all.
> >
> > It is the ->on_cpu = 0 store that is important (for both the
> > UNLOCKED_CTXSW cases). As soon as that store comes through the task can
> > start running on the remote cpu.
Yes, I came to the same conclusion right after I sent that email.
> > Now the below patch 'fixes' this but at the cost of adding a full
> > barrier which is somewhat unfortunate to say the least.
And yes, this is obviously the "fix" I had in mind, but:
> > wmb's are free on x86 and generally cheaper than mbs, so it would to
> > find another solution to this problem...
>
> Something like so then?
Hmm, indeed! Unfortunately I didn't find this simple solution. Yes, I think
we should check current->state == TASK_DEAD,
> @@ -2304,6 +2293,21 @@ context_switch(struct rq *rq, struct task_struct *prev,
> struct task_struct *next)
> {
> struct mm_struct *mm, *oldmm;
> + /*
> + * A task struct has one reference for the use as "current".
> + * If a task dies, then it sets TASK_DEAD in tsk->state and calls
> + * schedule one last time. The schedule call will never return, and
> + * the scheduled task must drop that reference.
> + *
> + * We must observe prev->state before clearing prev->on_cpu (in
> + * finish_lock_switch), otherwise a concurrent wakeup can get prev
> + * running on another CPU and we could race with its RUNNING -> DEAD
> + * transition, and then the reference would be dropped twice.
> + *
> + * We avoid the race by observing prev->state while it is still
> + * current.
> + */
> + long prev_state = prev->state;
This doesn't really matter, but probably it would be better to do this right
before switch_to(), prev == current until this point.
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists