linux-kernel - Re: [patch 3/21] x86, bts: wait until traced task has been scheduled out

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090401114140.GB23678@elte.hu>
Date:	Wed, 1 Apr 2009 13:41:40 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Oleg Nesterov <oleg@...hat.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	Markus Metzger <markus.t.metzger@...el.com>,
	linux-kernel@...r.kernel.org, tglx@...utronix.de, hpa@...or.com,
	markus.t.metzger@...il.com, roland@...hat.com,
	eranian@...glemail.com, juan.villacis@...el.com,
	ak@...ux.jf.intel.com
Subject: Re: [patch 3/21] x86, bts: wait until traced task has been
	scheduled out


* Oleg Nesterov <oleg@...hat.com> wrote:

> On 03/31, Markus Metzger wrote:
> >
> > +static void wait_to_unschedule(struct task_struct *task)
> > +{
> > +	unsigned long nvcsw;
> > +	unsigned long nivcsw;
> > +
> > +	if (!task)
> > +		return;
> > +
> > +	if (task == current)
> > +		return;
> > +
> > +	nvcsw  = task->nvcsw;
> > +	nivcsw = task->nivcsw;
> > +	for (;;) {
> > +		if (!task_is_running(task))
> > +			break;
> > +		/*
> > +		 * The switch count is incremented before the actual
> > +		 * context switch. We thus wait for two switches to be
> > +		 * sure at least one completed.
> > +		 */
> > +		if ((task->nvcsw - nvcsw) > 1)
> > +			break;
> > +		if ((task->nivcsw - nivcsw) > 1)
> > +			break;
> > +
> > +		schedule();
> 
> schedule() is a nop here. We can wait unpredictably long...
> 
> Ingo, do have have any ideas to improve this helper?

hm, there's a similar looking existing facility: 
wait_task_inactive(). Have i missed some subtle detail that makes it 
inappropriate for use here?

> Not that I really like it, but how about
> 
> 	int force_unschedule(struct task_struct *p)
> 	{
> 		struct rq *rq;
> 		unsigned long flags;
> 		int running;
> 
> 		rq = task_rq_lock(p, &flags);
> 		running = task_running(rq, p);
> 		task_rq_unlock(rq, &flags);
> 
> 		if (running)
> 			wake_up_process(rq->migration_thread);
> 
> 		return running;
> 	}
> 
> which should be used instead of task_is_running() ?

Yes - wait_task_inactive() should be switched to a scheme like that 
- it would fix bugs like:

  53da1d9: fix ptrace slowness

in a cleaner way.

> We can even do something like
> 
> 	void wait_to_unschedule(struct task_struct *task)
> 	{
> 		struct migration_req req;
> 
> 		rq = task_rq_lock(p, &task);
> 		running = task_running(rq, p);
> 		if (running) {
> 			// make sure __migrate_task() will do nothing
> 			req->dest_cpu = NR_CPUS + 1;
> 			init_completion(&req->done);
> 			list_add(&req->list, &rq->migration_queue);
> 		}
> 		task_rq_unlock(rq, &flags);
> 
> 		if (running) {
> 			wake_up_process(rq->migration_thread);
> 			wait_for_completion(&req.done);
> 		}
> 	}
> 
> This way we don't poll, and we need only one helper.

Looks even better. The migration thread would run complete(), right?

A detail: i suspect this needs to be in a while() loop, for the case 
that the victim task raced with us and went to another CPU before we 
kicked it off via the migration thread.

This looks very useful to me. It could also be tested easily: revert 
53da1d9 and you should see:

   time strace dd if=/dev/zero of=/dev/null bs=1024 count=1000000

performance plummet on an SMP box. The with your fix it should go up 
to near full speed again.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/