linux-kernel - Re: [PATCH v3 1/1] cgroup: fix deadlock caused by cgroup_mutex and cpu_hotplug

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3syvchv2pryjtqrwjeyfddhfzcmgnkv7znq7fv6tt75cysg6fn@ee2m3svbqr6x>
Date: Fri, 27 Sep 2024 16:03:14 +0200
From: Michal Koutný <mkoutny@...e.com>
To: Hillf Danton <hdanton@...a.com>
Cc: Chen Ridong <chenridong@...wei.com>, tj@...nel.org, 
	cgroups@...r.kernel.org, Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>, 
	Boqun Feng <boqun.feng@...il.com>, Linus Torvalds <torvalds@...ux-foundation.org>, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 1/1] cgroup: fix deadlock caused by cgroup_mutex and
 cpu_hotplug_lock

On Fri, Sep 27, 2024 at 07:25:16PM GMT, Hillf Danton <hdanton@...a.com> wrote:
> > Or if the negation is correct, why do you mean that processed work item
> > is _not_ preventing thread T from running (in the case I left quoted
> > above)?
> >
> If N (N > 1) cgroup work items are queued before one cpu hotplug work, then
> 1) workqueue worker1 dequeues cgroup work1 and executes it,
> 2) worker1 goes off cpu and falls in nap because of failure of acquiring
> cgroup_mutex,
> 3) worker2 starts processing cgroup work2 and repeats 1) and 2),
> 4) after N sleepers, workerN+1 dequeus the hotplug work and executes it
> and completes finally.

My picture of putting everything under one system_wq worker was a bit
clumsy. I see how other workers can help out with processing the queue,
that's where then N >= WQ_DFL_ACTIVE comes into play, then this gets
stuck(?). [1]

IOW, if N < WQ_DFL_ACTIVE, the mutex waiters in the queue are harmless.

> Clear lad?

I hope, thanks!

Michal

[1] I don't see a trivial way how to modify lockdep to catch this
    (besides taking wq saturation into account it would also need to
    propagate some info across complete->wait_for_completion).


Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)