linux-kernel - Re: [PATCH -next] cgroup: Fix AA deadlock caused by cgroup_bpf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5badbb85-b9e9-4170-a1b9-9b6d13135507@huawei.com>
Date: Fri, 12 Jul 2024 09:15:53 +0800
From: chenridong <chenridong@...wei.com>
To: Tejun Heo <tj@...nel.org>, Roman Gushchin <roman.gushchin@...ux.dev>
CC: <martin.lau@...ux.dev>, <ast@...nel.org>, <daniel@...earbox.net>,
	<andrii@...nel.org>, <eddyz87@...il.com>, <song@...nel.org>,
	<yonghong.song@...ux.dev>, <john.fastabend@...il.com>, <kpsingh@...nel.org>,
	<sdf@...gle.com>, <haoluo@...gle.com>, <jolsa@...nel.org>,
	<lizefan.x@...edance.com>, <hannes@...xchg.org>, <bpf@...r.kernel.org>,
	<cgroups@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH -next] cgroup: Fix AA deadlock caused by
 cgroup_bpf_release



On 2024/7/12 1:36, Tejun Heo wrote:
> Hello,
> 
> On Thu, Jul 11, 2024 at 03:52:34AM +0000, Roman Gushchin wrote:
>>> The max_active of system_wq is WQ_DFL_ACTIVE(256). If all active works are
>>> cgroup bpf release works, it will block smp_call_on_cpu work which enque
>>> after cgroup bpf releases. So smp_call_on_cpu holding cpu_hotplug_lock will
>>> wait for completion, but it can never get a completion because cgroup bpf
>>> release works can not get cgroup_mutex and will never finish.
>>> However, Placing the cgroup bpf release works on cgroup destroy will never
>>> block smp_call_on_cpu work, which means loop is broken. Thus, it can solve
>>> the problem.
>>
>> Tejun,
>>
>> do you have an opinion on this?
>>
>> If there are certain limitations from the cgroup side on what can be done
>> in a generic work context, it would be nice to document (e.g. don't grab
>> cgroup mutex), but I still struggle to understand what exactly is wrong
>> with the blamed commit.
> 
> I think the general rule here is more "don't saturate system wqs" rather
> than "don't grab cgroup_mutex from system_wq". system wqs are for misc
> things which shouldn't create a large number of concurrent work items. If
> something is going to generate 256+ concurrent work items, it should use its
> own workqueue. We don't know what's in system wqs and can't expect its users
> to police specific lock usages.
> 
Thank you, Tj. That's exactly what I'm trying to convey. Just like 
cgroup, which has its own workqueue and may create a large number of 
release works, it is better to place all its related works on its 
workqueue rather than on system wqs.

Regards,
Ridong

> Another aspect is that the current WQ_DFL_ACTIVE is an arbitrary number I
> came up with close to 15 years ago. Machine size has increased by multiple
> times, if not an order of magnitude since then. So, "there can't be a
> reasonable situation where 256 concurrency limit isn't enough" is most
> likely not true anymore and the limits need to be pushed upward.
> 
> Thanks.
>