lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <504965AA.6090107@cn.fujitsu.com>
Date:	Fri, 07 Sep 2012 11:10:34 +0800
From:	Lai Jiangshan <laijs@...fujitsu.com>
To:	Tejun Heo <tj@...nel.org>
CC:	linux-kernel@...r.kernel.org
Subject: Re: [PATCH wq/for-3.6-fixes 3/3] workqueue: fix possible idle worker
 depletion during CPU_ONLINE

On 09/07/2012 04:08 AM, Tejun Heo wrote:
>>>From 985aafbf530834a9ab16348300adc7cbf35aab76 Mon Sep 17 00:00:00 2001
> From: Tejun Heo <tj@...nel.org>
> Date: Thu, 6 Sep 2012 12:50:41 -0700
> 
> To simplify both normal and CPU hotplug paths, while CPU hotplug is in
> progress, manager_mutex is held to prevent one of the workers from
> becoming a manager and creating or destroying workers; unfortunately,
> it currently may lead to idle worker depletion which in turn can lead
> to deadlock under extreme circumstances.
> 
> Idle workers aren't allowed to become busy if there's no other idle
> worker left to create more idle workers, but during CPU_ONLINE
> gcwq_associate() is holding all managerships and all the idle workers
> can proceed to become busy before gcwq_associate() is finished.
> 
> This patch fixes the bug by releasing manager_mutexes before letting
> the rebound idle workers go.  This ensures that by the time idle
> workers check whether management is necessary, CPU_ONLINE already has
> released the positions.
> 

Could you review manage_workers_slowpath() in V4 patchset.
It has enough changelog and comments.

After the discussion,

We don't move the hotplug code outside hotplug code. it matches this requirement.

Since we introduce manage_mutex(), any palace should be allowed to grab it
when its context allows. So it is not hotplug code's responsibility of this bug.

manage_workers() just use mutex_trylock() to grab the lock, it does not make
hard to do it jobs when need, and it does not try to find out the reason of fail.
so I think it is manage_workers()'s responsibility to handle this bug.
a manage_workers_slowpath() is enough to fix the bug.

=====
manage_workers_slowpath() just adds a little overhead over manage_workers(),
so we can use manage_workers_slowpath() to replace manage_workers(), thus we
can reduce the code of manage_workers() and we can do more cleanup for manage.

Thanks,
Lai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ