linux-kernel - Re: [patch] sched: fix b5d9d734 blunder in task_new

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1259312264.7747.4.camel@marge.simson.net>
Date:	Fri, 27 Nov 2009 09:57:44 +0100
From:	Mike Galbraith <efault@....de>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Ingo Molnar <mingo@...e.hu>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [patch] sched: fix b5d9d734 blunder in task_new_fair()

On Fri, 2009-11-27 at 09:45 +0100, Mike Galbraith wrote:
> Off list

Guess not, mouse went click on the way to save menu.  Oh well, was going
to spare thousands of mailboxes another confused /me message, but too
late now.

> On Thu, 2009-11-26 at 17:26 +0100, Peter Zijlstra wrote:
> > On Sun, 2009-11-22 at 13:07 +0100, Mike Galbraith wrote:
> > > @@ -2589,16 +2588,10 @@ void sched_fork(struct task_struct *p, i
> > >          */
> > >         p->prio = current->normal_prio;
> > >  
> > > -       if (!rt_prio(p->prio))
> > > +       if (!task_has_rt_policy(p))
> > >                 p->sched_class = &fair_sched_class;
> > >  
> > > -#ifdef CONFIG_SMP
> > > -       cpu = p->sched_class->select_task_rq(p, SD_BALANCE_FORK, 0);
> > > -#endif
> > > -       local_irq_save(flags);
> > > -       update_rq_clock(cpu_rq(cpu));
> > > -       set_task_cpu(p, cpu);
> > > -       local_irq_restore(flags);
> > > +       __set_task_cpu(p, cpu); 
> > 
> > OK, so I figured out why it was in sched_fork() and not in
> > wake_up_new_task().
> > 
> > It is because in sched_fork() the new task isn't in the tasklist yet and
> > can therefore not race with any other migration logic.
> 
> All the raciness I'm fretting over probably just doesn't matter much.
> Things aren't exploding.  Maybe min_vruntime is the only thing I should
> be worrying about.  That concern is in-flight deltas of SCHED_IDLE
> magnitude, ie cross cpu "fuzziness" on a very large scale.
> 
> However :-/  (aw sh*t, here i go again.  aaaaOOOOOgah! dive! dive!;)
> 
> WRT affinity, sched_class, nice level fretting, that can all change from
> userland at any instant that you do not hold the task's runqueue lock
> and the tasklist lock is not held by somebody to keep them from getting
> a task reference to start the ball rolling.  As soon as you drop the
> runqueue lock, userland can acquire, and change whatever it likes under
> you, so afaikt, we can call the wrong sched_class method etc etc.
> 
> 3f029d3 agrees fully wrt sched_class at least:
>     In addition, we fix a race condition where we try to access
>     current->sched_class without holding the rq->lock.  This is
>     technically racy, as the sched-class could change out from
>     under us.  Instead, we reference the per-rq post_schedule
>     variable with the runqueue unlocked, but with preemption
>     disabled to see if we need to reacquire the rq->lock.
> 
> The only thing that really changes with the unlocked _rummaging_ is that
> we now can't count on nr_running/load on the task's current runqueue,
> sched_class etc while you're rummaging, ALL state is fuzzy, instead of
> only most.
> 
> However, I don't think we can even count on the task remaining untouched
> while in TASK_WAKING state, and that might be a bigger deal.
> 
> afaikt, userland can migrate the task you're in the process of waking
> while you're off rummaging around looking for a place to put it, like
> so:  Wakee is on the tasklist, can be accessed by userland.  We wouldn't
> be in ttwu either were it not.  We're waking, we set task state to
> TASK_WAKING, release the lock, userland acquires, nobody but ttwu has
> ever heard of a TASK_WAKING, so it happily changes task's affinity,
> migrates the sleeping task to the one and only (pins) correct runqueue,
> sets task cpu etc, releases the lock, and goes home.  We finish
> rummaging, do NOT check for an intervening affinity change, instead, we
> do an unlocked scribble over what userland just wrote, resetting cpu and
> vruntime to a now illegal cpu, and activate.  I'm not seeing any
> inhibitor for this scenario.
> 
> When I moved fork balancing runqueue selection to the much more logical
> wakeup and enqueue time, vs copy and fiddle time, I didn't fix anything,
> I merely duplicated the races that are now in ttwu.
> 
> No matter were we do the selection, we can race with userland if the
> darn thing isn't locked all the while.  With .31 ttwu locking, there is
> no race, because nobody can get to the task struct.  If target cpu
> changes via rq selection, we set cpu, _then_ unlock, at which point
> userland or whomever may override _our_ decision, but we never write
> after re-acquiring, so intervening changes, if any, stay intact.
> 
> With an exec, userland policy/affinity change will deactivate/activate
> or do a migration call.  We don't have the thing locked while we're
> rummaging, so what keeps sched_class from changing after we evaluated,
> so we call the wrong method, and then do our own migration call?
> 
> /me is still pretty befuddled, and haven't even crawled over PI.
> 
> I flat don't see how we can do this race free, unless we put every task
> in some untouchable state while we're rummaging, and teach everything
> having to do with migration about that state.
> 
> 	-Mike
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/