[PATCH 3/5] sched_exec() should use select_fallback_rq() logic sched_exec()->select_task_rq() reads/updates ->cpus_allowed lockless. This can race with other CPUs updating our ->cpus_allowed, and this looks meaningless to me. The task is current and running, it must have online cpus in ->cpus_allowed, the fallback mode is bogus. And, if ->sched_class returns the "wrong" cpu, this likely means we raced with set_cpus_allowed() which was called for reason, why should sched_exec() retry and call ->select_task_rq() again? Change the code to call sched_class->select_task_rq() directly and do nothing if the returned cpu is wrong after re-checking under rq->lock. >From now select_fallback_rq() is always called with either rq-lock or TASK_WAKING held. TODO: update the comments. Signed-off-by: Oleg Nesterov --- --- 34-rc1/kernel/sched.c~3_SCHED_EXEC_DONT_FALLBACK 2010-03-13 19:50:57.000000000 +0100 +++ 34-rc1/kernel/sched.c 2010-03-13 19:51:11.000000000 +0100 @@ -3123,9 +3123,8 @@ void sched_exec(void) unsigned long flags; struct rq *rq; -again: this_cpu = get_cpu(); - dest_cpu = select_task_rq(p, SD_BALANCE_EXEC, 0); + dest_cpu = p->sched_class->select_task_rq(p, SD_BALANCE_EXEC, 0); if (dest_cpu == this_cpu) { put_cpu(); return; @@ -3133,18 +3132,12 @@ again: rq = task_rq_lock(p, &flags); put_cpu(); - /* * select_task_rq() can race against ->cpus_allowed */ - if (!cpumask_test_cpu(dest_cpu, &p->cpus_allowed) - || unlikely(!cpu_active(dest_cpu))) { - task_rq_unlock(rq, &flags); - goto again; - } - - /* force the process onto the specified CPU */ - if (migrate_task(p, dest_cpu, &req)) { + if (cpumask_test_cpu(dest_cpu, &p->cpus_allowed) && + likely(cpu_active(dest_cpu)) && + migrate_task(p, dest_cpu, &req)) { /* Need to wait for migration thread (might exit: take ref). */ struct task_struct *mt = rq->migration_thread;