linux-kernel - Re: [PATCH -v2] cgroup: fix deadlock caused by cgroup_mutex and cpu_hotplug

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <mxyismki3ln2pvrbhd36japfffpfcwgyvgmy5him3n746w6wd6@24zlflalef6x>
Date: Wed, 7 Aug 2024 15:32:59 +0200
From: Michal Koutný <mkoutny@...e.com>
To: chenridong <chenridong@...wei.com>
Cc: Hillf Danton <hdanton@...a.com>, 
	Roman Gushchin <roman.gushchin@...ux.dev>, tj@...nel.org, bpf@...r.kernel.org, cgroups@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH -v2] cgroup: fix deadlock caused by cgroup_mutex and
 cpu_hotplug_lock

Hello.

On Sat, Jul 27, 2024 at 06:21:55PM GMT, chenridong <chenridong@...wei.com> wrote:
> Yes, I have offered the scripts in Link(V1).

Thanks (and thanks for patience).
There is no lockdep complain about a deadlock (i.e. some circular
locking dependencies). (I admit the multiple holders of cgroup_mutex
reported there confuse me, I guess that's an artifact of this lockdep
report and they could be also waiters.)

> > Who'd be the holder of cgroup_mutex preventing cgroup_bpf_release from
> > progress? (That's not clear to me from your diagram.)
> > 
> This is a cumulative process. The stress testing deletes a large member of
> cgroups, and cgroup_bpf_release is asynchronous, competing with cgroup
> release works.

Those are different situations:
- waiting for one holder that's stuck for some reason (that's what we're
  after),
- waiting because the mutex is contended (that's slow but progresses
  eventually).

> You know, cgroup_mutex is used in many places. Finally, the number of
> `cgroup_bpf_release` instances in system_wq accumulates up to 256, and
> it leads to this issue.

Reaching max_active doesn't mean that queue_work() would block or the
items were lost. They are only queued onto inactive_works list.
(Remark: cgroup_destroy_wq has only max_active=1 but it apparently
doesn't stop progress should there be more items queued (when
when cgroup_mutex is not guarding losing references.))

---

The change on its own (deferred cgroup bpf progs removal via
cgroup_destroy_wq instead of system_wq) is sensible by collecting
related objects removal together (at the same time it shouldn't cause
problems by sharing one cgroup_destroy_wq).

But the reasoning in the commit message doesn't add up to me. There
isn't obvious deadlock, I'd say that system is overloaded with repeated
calls of __lockup_detector_reconfigure() and it is not in deadlock
state -- i.e. when you stop the test, it should eventually recover.
Given that, I'd neither put Fixes: 4bfc0bb2c60e there.

(One could symetrically argue to move smp_call_on_cpu() away from
system_wq instead of cgroup_bpf_release_fn().)

Honestly, I'm not sure it's worth the effort if there's no deadlock.

It's possible that I'm misunderstanding or I've missed a substantial
detail for why this could lead to a deadlock. It'd be best visible in a
sequence diagram with tasks/CPUs left-to-right and time top-down (in the
original scheme it looks like time goes right-to-left and there's the
unclear situation of the initial cgroup_mutex holder).

Thanks,
Michal

Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)