lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 11 Apr 2013 22:47:09 +0200 (CEST)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
cc:	Borislav Petkov <bp@...en8.de>, Dave Hansen <dave@...1.net>,
	LKML <linux-kernel@...r.kernel.org>,
	Dave Jones <davej@...hat.com>, dhillf@...il.com,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>
Subject: Re: [PATCH] kthread: Prevent unpark race which puts threads on the
 wrong cpu

Srivatsa,

On Fri, 12 Apr 2013, Srivatsa S. Bhat wrote:
> On 04/09/2013 08:08 PM, Thomas Gleixner wrote:
> > Add a new task state (TASK_PARKED) which prevents other wakeups and
> > use this state explicitely for the unpark wakeup.
> >
> 
> Again, I think this is unnecessary. We are good as long as no one other
> than the unpark code can kick the kthreads out of the loop in the park
> code. Now that I understand the race you explained above, why not just
> fix that race itself by reversing the ordering of clear(SHOULD_PARK)
> and bind_to(CPU2)? That way, even if someone else wakes up the per-cpu
> kthread, it will just remain confined to the park code, as intended.

In theory.
 
> A patch like below should do it IMHO. I guess I'm being a little too
> persistent, sorry!

No it's not about being persistent, you're JUST too much into voodoo
programming instead of going for the straight forward and robust
solutions.

Darn, I hate it as much as everyone else to introduce a new task
state, but that state allows us to make guarantees and gives us
semantical clarity. A parked thread is parked and can only be woken up
by the unpark code. That's clear semantics and not a magic construct
which will work in most cases and for the remaining ones (See below)
it will give us problems which are way harder to decode than the ones
we tried to fix with that magic.

> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index 691dc2e..9512fc5 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -308,6 +308,15 @@ struct task_struct *kthread_create_on_cpu(int (*threadfn)(void *data),
>  	to_kthread(p)->cpu = cpu;
>  	/* Park the thread to get it out of TASK_UNINTERRUPTIBLE state */
>  	kthread_park(p);
> +
> +	/*
> +	 * Wait for p->on_rq to be reset to 0, to ensure that the per-cpu
> +	 * migration thread (which belongs to the stop_task sched class)
> +	 * doesn't run until the cpu is actually onlined and the thread is
> +	 * unparked.
> +	 */
> +	if (!wait_task_inactive(p, TASK_INTERRUPTIBLE))
> +		WARN_ON(1);

Yay, we rely on TASK_INTERRUPTIBLE state with a task which already has
references outside the creation code. And then we _HOPE_ that nothing
wakes it up _BEFORE_ we do something else.

Aside of that, you are still insisting to enforce that for every per
cpu thread even if the only one which needs that at this point are
thos which have a create() callback (i.e. the migration thread). And
next week you figure out that this is a performance impact on bringing
up large machines....

>  /**
>   * kthread_unpark - unpark a thread created by kthread_create().
>   * @k:		thread created by kthread_create().
> @@ -337,18 +357,29 @@ void kthread_unpark(struct task_struct *k)
>  	struct kthread *kthread = task_get_live_kthread(k);
>  
>  	if (kthread) {
> +		/*
> +		 * Per-cpu kthreads such as ksoftirqd can get woken up by
> +		 * other events. So after binding the thread, ensure that
> +		 * it goes off the CPU atleast once, by parking it again.
> +		 * This way, we can ensure that it will run on the correct
> +		 * CPU on subsequent wakeup.
> +		 */
> +		if (test_bit(KTHREAD_IS_PER_CPU, &kthread->flags)) {
> +			__kthread_bind(k, kthread->cpu);
> +			clear_bit(KTHREAD_IS_PARKED, &kthread->flags);

And how is that f*cking different from the previous code?

CPU0	   		CPU1		       CPU2
				       	       wakeup(T) -> run on CPU1 (last cpu)

			switch_to(T)

__kthread_bind(T, CPU2)

clear(KTHREAD_IS_PARKED)

			leave loop due to !KTHREAD_IS_PARKED

			BUG(wrong cpu)  <--- VOODOO FAILURE

kthread_park(T) <-- VOODOO TOO LATE

You can turn around the order of clearing/setting the flags as much as
you want, I'm going to punch an hole in it.

TASK_PARKED is the very obvious and robust solution which fixes _ALL_
of the corner cases, at least as far as I can imagine them. And
robustness rules at least in my world.

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ