linux-kernel - Re: Problem with freezable workqueues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070227232855.GA457@tv-sign.ru>
Date:	Wed, 28 Feb 2007 02:28:55 +0300
From:	Oleg Nesterov <oleg@...sign.ru>
To:	"Rafael J. Wysocki" <rjw@...k.pl>
Cc:	Pavel Machek <pavel@....cz>, Gautham R Shenoy <ego@...ibm.com>,
	Johannes Berg <johannes@...solutions.net>,
	LKML <linux-kernel@...r.kernel.org>,
	Srivatsa Vaddagiri <vatsa@...ibm.com>
Subject: Re: Problem with freezable workqueues

On 02/27, Rafael J. Wysocki wrote:
>
> We have a problem with freezable workqueues in 2.6.21-rc1 and in -mm
> (there are only two of them, in XFS, but still).  Namely, their worker threads
> deadlock with workqueue_cpu_callback() that gets called during the CPU hotplug,
> becuase workqueue_cpu_callback() tries to stop these threads while they are
> frozen (disable_nonboot_cpus() happens after we've frozen tasks).

Ugh. I know nothing, nothing, nothing about suspend. I'll try to guess.

   Commit: ed746e3b18f4df18afa3763155972c5835f284c5

   [PATCH] swsusp: Change code ordering in disk.c

   Change the ordering of code in kernel/power/disk.c so that device_suspend() is
   called before disable_nonboot_cpus() and platform_finish() is called after
   enable_nonboot_cpus() and before device_resume(), as indicated by the recent
   discussion on Linux-PM (cf.
   http://lists.osdl.org/pipermail/linux-pm/2006-November/004164.html).

   The changes here only affect the built-in swsusp.

Yes? with the patch above, _cpu_down() called _after_ freeze_processes() ???
Honestly, I can't understand this (yes, I know nothing, nothing, nothing...).

> For 2.6.21-rc1 I've invented the appended workaround (works for me, waiting for
> Johannes to confirm it works for him too), but I think we need something better
> for -mm and future kernels.

How about other kthread_stop()s ? For example, kernel/softirq.c:cpu_callback() ?

I think we need a general "cpu_down() after freeze" implementation, this is what
Gautham and Srivatsa are working on, right?

> --- linux-2.6.21-rc1.orig/kernel/workqueue.c	2007-02-24 10:17:57.000000000 +0100
> +++ linux-2.6.21-rc1/kernel/workqueue.c	2007-02-24 20:00:22.000000000 +0100
> @@ -376,8 +376,19 @@ static int worker_thread(void *__cwq)
>  
>  	set_current_state(TASK_INTERRUPTIBLE);
>  	while (!kthread_should_stop()) {
> -		if (cwq->freezeable)
> -			try_to_freeze();
> +		if (try_to_freeze()) {
> +			/* We've just left the refrigerator.  If our CPU is
> +			 * a nonboot one, we might have been replaced.
> +			 * The lock is taken to prevent the race with
> +			 * cleanup_workqueue_thread() from happening
> +			 */
> +			spin_lock_irq(&cwq->lock);

I'm afraid this is racy. We can't touch *cwq, it may be freed. Suppose
that another thread does destroy_workqueue(), and we thaw that thread
before cwq->thread.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/