linux-kernel - Re: [RFC] Thread Migration Preemption

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070714195653.GB108@tv-sign.ru>
Date:	Sat, 14 Jul 2007 23:56:53 +0400
From:	Oleg Nesterov <oleg@...sign.ru>
To:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
Cc:	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>,
	Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [RFC] Thread Migration Preemption - v2

On 07/14, Mathieu Desnoyers wrote:
>
> * Oleg Nesterov (oleg@...sign.ru) wrote:
> > >  	/* Affinity changed (again). */
> > >  	if (!cpu_isset(dest_cpu, p->cpus_allowed))
> > >  		goto out;
> > >  
> > >  	on_rq = p->se.on_rq;
> > > +#ifdef CONFIG_PREEMPT
> > > +	if (!on_rq && task_thread_info(p)->migrate_count)
> > > +		goto out;
> > > +#endif
> > 
> > This means that move_task_off_dead_cpu() will spin until the task will be scheduled
> > on the dead CPU. Given that we hold tasklist_lock and irqs are disabled, this may
> > never happen.
> > 
> 
> Yes. My idea to fix this issue is the following:
> 
> If a thread has non zero migrate_count, we should still move it to a
> different CPU upon hotplug cpu removal, even if this thread resists
> migration. Care should be taken to send _all_ such threads to the _same_
> CPU so they don't race for the per-cpu ressources. Does it make sense ?
> 
> We would have to keep the CPU affinity of the threads running on the
> wrong CPU until they end their migrate disabled section, so that we can
> put them back on their original CPU if it goes back online, otherwise we
> could end up with concurrent per-cpu variables accesses.

Well, this means that migrate_disable() doesn't guarantee a stable
smp_processor_id(), not good.

> > > @@ -4891,10 +4957,22 @@
> > >  		list_del_init(head->next);
> > >  
> > >  		spin_unlock(&rq->lock);
> > > -		__migrate_task(req->task, cpu, req->dest_cpu);
> > > +		migrated = __migrate_task(req->task, cpu, req->dest_cpu);
> > >  		local_irq_enable();
> > > -
> > > -		complete(&req->done);
> > > +		if (!migrated) {
> > > +			/*
> > > +			 * If the process has not been migrated, let it run
> > > +			 * until it reaches a migration_check() so it can
> > > +			 * wake us up.
> > > +			 */
> > > +			spin_lock_irq(&rq->lock);
> > > +			head = &rq->migration_queue;
> > > +			list_add(&req->list, head);
> > > +			set_tsk_thread_flag(req->task, TIF_NEED_MIGRATE);
> > > +			spin_unlock_irq(&rq->lock);
> > > +			wake_up_process(req->task);
> > > +		} else
> > > +			complete(&req->done);
> > 
> > I guess this is migration_thread(). The wake_up_process(req->task) looks strange,
> > why? It can't help if the task waits for the event/mutex.
> > 
> 
> Hrm, the idea was to wake up the thread that is in the migrate disabled
> section, which is what I seem to do req->task points to the process we
> try to migrate. We poke it like this until is ends its critical
> section.

But this can only waste CPU, nothing more, no? Suppose that req->thread
sleeps waiting for the mutex. You can wake it up, and it will call schedule()
again.

This can help if req->thread does something like schedule_timeout(), but
I don't think this is a common case.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/