linux-kernel - Re: [patch] sched: fix set_task_cpu() and provide an unlocked runqueue variant

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <1259249564.6465.75.camel@marge.simson.net>
Date:	Thu, 26 Nov 2009 16:32:44 +0100
From:	Mike Galbraith <efault@....de>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Ingo Molnar <mingo@...e.hu>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [patch] sched: fix set_task_cpu() and provide an unlocked
 runqueue variant

On Thu, 2009-11-26 at 15:21 +0100, Peter Zijlstra wrote:
> On Thu, 2009-11-26 at 15:09 +0100, Peter Zijlstra wrote:
> > On Thu, 2009-11-26 at 11:16 +0100, Mike Galbraith wrote:
> > > > min_vruntime should only ever be poked at when holding the respective
> > > > rq->lock, even with a barrier a 64bit read on a 32bit machine can go all
> > > > funny.
> > > 
> > > Yeah, but we're looking at an unlocked runqueue.  But never mind...
> > 
> > The patch is also poking at rq->clock without rq->lock held... not very
> > nice.
> > 
> > Gah, I hate that we're doing migration things without holding both rq's,
> > this is making live so very interesting ;-)
> 
> so the problem is this bit in kernel/fork.c, which is ran with
> tasklist_lock held for writing:
> 
> 	/*
> 	 * The task hasn't been attached yet, so its cpus_allowed mask will
> 	 * not be changed, nor will its assigned CPU.
> 	 *
> 	 * The cpus_allowed mask of the parent may have changed after it was
> 	 * copied first time - so re-copy it here, then check the child's CPU
> 	 * to ensure it is on a valid CPU (and if not, just force it back to
> 	 * parent's CPU). This avoids alot of nasty races.
> 	 */
> 	p->cpus_allowed = current->cpus_allowed;
> 	p->rt.nr_cpus_allowed = current->rt.nr_cpus_allowed;
> 	if (unlikely(!cpu_isset(task_cpu(p), p->cpus_allowed) ||
> 			!cpu_online(task_cpu(p))))
> 		set_task_cpu(p, smp_processor_id());
> 
> 
> The problem is that that doesn't close any races at all since
> tasklist_lock doesn't fully serialize changes to ->cpus_allowed.

Well, some stuff can't get at you if you're there, but yes, I was
wondering how fixing it up there was going to guarantee a happy landing
when we get to... wake_up_new_task(). 

> In fact, there is nothing that protects that mask at all.
> 
> The second problem is that set_task_cpu() is accessing data from both
> the old and the new rq, which basically requires is being ran with both
> rq's locked, and the regular migration paths do so.

Yes, and task_cpu() and task_rq() are racy as heck without the lock.  It
all goes fuzzy.

sched_class can change out from under you the instant you release the
runqueue lock afaikt, nice level, affinity... etc?

> However things like ttwu() try to be cute and do not, opening the doors
> to all kinds of funny.

Yes, so all the raciness I've been imagining isn't _all_ imaginary.
Yoohoo.  Um, I mean damn.

> Clearly we don't really want to do double_rq_lock() in ttwu(), that's
> one of the hotter paths around (and looking at it we ought to seriously
> look at trimming some of it).

No, apparently not.  About an hour ago, paranoid little me merely did
lock handoff in ttwu and... wunt (wunt?), and was rewarded with a
deadlocked box a bit after X came up.

WRT lard, yes, it is getting fat.  The cache misses of the prefer
sibling thing are hurting very fast threads too.  Much reward if you
find a sibling, ~4% pain for TCP_RR with the cache misses and whatnot
you waste looking around for a spot for a pinned ultralight task.

Wish I could find an answer for the sibling thing.  Nearly doubles
throughput for some things.

	-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/