lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Wed, 19 Feb 2014 19:29:58 -0500
From:	Tejun Heo <tj@...nel.org>
To:	Mikulas Patocka <mpatocka@...hat.com>
Cc:	linux-kernel@...r.kernel.org, dm-devel@...hat.com,
	Andrew Morton <akpm@...ux-foundation.org>,
	Lisa Du <chunlingdu1@...il.com>,
	Mandeep Singh Baines <msb@...omium.org>
Subject: Re: work item migration bug when a CPU is disabled

Hello, Mikulas.

On Tue, Feb 18, 2014 at 08:57:11PM -0500, Mikulas Patocka wrote:
> Hi Tejun
> 
> Two years ago, I reported a bug in workqueues - a work item that is 
> supposed to be bound to a specific CPU can be migrated to a different CPU 
> when the origianl CPU is disabled by writing zero to 
> /sys/devices/system/cpu/cpu*/online
> 
> This causes crashes in dm-crypt, because it assumes that a work item stays 
> on the same CPU.

For better or worse, per-cpu workqueues have never guaranteed that
cpus won't go down while a work item is executing.  If a workqueue
user needs such guarantee, it's required to use one of the CPU down
hooks to cancel and flush such work items.  This is partly because
workqueue itself doesn't distinguish work items which need to be bound
for correctness and just use affinity as optimization.  The
distinction is made by the user.

It has certain benefits as it makes clear in the code local to the
specific user that it's incurring latency in CPU down operations which
happen to be fairly hot in certain configurations.  Besides, it's not
really clear what behavior workqueue can enforce - should it try to
drain as in wq shutdown sequence, or should it trigger WARN if work
items are requeueing, or should it just leave them hanging until CPU
comes back again?  If we do the last, what about the ones which are
using percpu workqeueus as optimization?

So, if dm-crypt is depending on affinity and not taking care of it via
cpu hotplug hooks, it's something which should be fixed from dm-crypt
side.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ