linux-kernel - Re: work item migration bug when a CPU is disabled

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:	Wed, 19 Feb 2014 19:29:58 -0500
From:	Tejun Heo <tj@...nel.org>
To:	Mikulas Patocka <mpatocka@...hat.com>
Cc:	linux-kernel@...r.kernel.org, dm-devel@...hat.com,
	Andrew Morton <akpm@...ux-foundation.org>,
	Lisa Du <chunlingdu1@...il.com>,
	Mandeep Singh Baines <msb@...omium.org>
Subject: Re: work item migration bug when a CPU is disabled

Hello, Mikulas.

On Tue, Feb 18, 2014 at 08:57:11PM -0500, Mikulas Patocka wrote:
> Hi Tejun
> 
> Two years ago, I reported a bug in workqueues - a work item that is 
> supposed to be bound to a specific CPU can be migrated to a different CPU 
> when the origianl CPU is disabled by writing zero to 
> /sys/devices/system/cpu/cpu*/online
> 
> This causes crashes in dm-crypt, because it assumes that a work item stays 
> on the same CPU.

For better or worse, per-cpu workqueues have never guaranteed that
cpus won't go down while a work item is executing.  If a workqueue
user needs such guarantee, it's required to use one of the CPU down
hooks to cancel and flush such work items.  This is partly because
workqueue itself doesn't distinguish work items which need to be bound
for correctness and just use affinity as optimization.  The
distinction is made by the user.

It has certain benefits as it makes clear in the code local to the
specific user that it's incurring latency in CPU down operations which
happen to be fairly hot in certain configurations.  Besides, it's not
really clear what behavior workqueue can enforce - should it try to
drain as in wq shutdown sequence, or should it trigger WARN if work
items are requeueing, or should it just leave them hanging until CPU
comes back again?  If we do the last, what about the ones which are
using percpu workqeueus as optimization?

So, if dm-crypt is depending on affinity and not taking care of it via
cpu hotplug hooks, it's something which should be fixed from dm-crypt
side.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/