lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240927112516.1136-1-hdanton@sina.com>
Date: Fri, 27 Sep 2024 19:25:16 +0800
From: Hillf Danton <hdanton@...a.com>
To: Michal Koutny <mkoutny@...e.com>
Cc: Chen Ridong <chenridong@...wei.com>,
	tj@...nel.org,
	cgroups@...r.kernel.org,
	Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
	Boqun Feng <boqun.feng@...il.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 1/1] cgroup: fix deadlock caused by cgroup_mutex and cpu_hotplug_lock

On Thu, 26 Sep 2024 14:53:46 +0200 Michal Koutny <mkoutny@...e.com>
> On Wed, Sep 11, 2024 at 07:15:42PM GMT, Hillf Danton <hdanton@...a.com> wrote:
> > > However, there is no ordering between (I) and (II) so they can also happen
> > > in opposite
> > > 
> > > 	thread T					system_wq worker
> > > 
> > > 	down(cpu_hotplug_lock.read)
> > > 	smp_call_on_cpu
> > > 	  queue_work_on(cpu, system_wq, scss) (I)
> > > 	  						lock(cgroup_mutex)  (II)
> > > 							...
> > > 							unlock(cgroup_mutex)
> > > 							scss.func
> > > 	  wait_for_completion(scss)
> > > 	up(cpu_hotplug_lock.read)
> > > 
> > > And here the thread T + system_wq worker effectively call
> > > cpu_hotplug_lock and cgroup_mutex in the wrong order. (And since they're
> > > two threads, it won't be caught by lockdep.)
> > > 
> > Given no workqueue work executed without being dequeued, any queued work,
> > regardless if they are more than 2048, that acquires cgroup_mutex could not
> > prevent the work queued by thread-T from being executed, so thread-T can
> > make safe forward progress, therefore with no chance left for the ABBA 
> > deadlock you spotted where lockdep fails to work.
> 
> Is there a forgotten negation and did you intend to write: "any queued
> work ... that acquired cgroup_mutex could prevent"?
> 
No I did not.

> Or if the negation is correct, why do you mean that processed work item
> is _not_ preventing thread T from running (in the case I left quoted
> above)?
>
If N (N > 1) cgroup work items are queued before one cpu hotplug work, then
1) workqueue worker1 dequeues cgroup work1 and executes it,
2) worker1 goes off cpu and falls in nap because of failure of acquiring
cgroup_mutex,
3) worker2 starts processing cgroup work2 and repeats 1) and 2),
4) after N sleepers, workerN+1 dequeus the hotplug work and executes it
and completes finally.

Clear lad?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ