linux-kernel - Re: [PATCH v3 1/1] cgroup: fix deadlock caused by cgroup_mutex and cpu_hotplug

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <83cea8c6-d2f8-42f2-990e-80412ebf296e@huaweicloud.com>
Date: Thu, 12 Sep 2024 09:33:23 +0800
From: Chen Ridong <chenridong@...weicloud.com>
To: Tejun Heo <tj@...nel.org>, Roman Gushchin <roman.gushchin@...ux.dev>
Cc: Michal Koutný <mkoutny@...e.com>,
 Chen Ridong <chenridong@...wei.com>, martin.lau@...ux.dev, ast@...nel.org,
 daniel@...earbox.net, andrii@...nel.org, eddyz87@...il.com, song@...nel.org,
 yonghong.song@...ux.dev, john.fastabend@...il.com, kpsingh@...nel.org,
 sdf@...gle.com, haoluo@...gle.com, jolsa@...nel.org,
 lizefan.x@...edance.com, hannes@...xchg.org, bpf@...r.kernel.org,
 cgroups@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 1/1] cgroup: fix deadlock caused by cgroup_mutex and
 cpu_hotplug_lock



On 2024/9/11 5:17, Tejun Heo wrote:
> On Tue, Sep 10, 2024 at 09:02:59PM +0000, Roman Gushchin wrote:
> ...
>>>> By that reasoning any holder of cgroup_mutex on system_wq makes system
>>>> susceptible to a deadlock (in presence of cpu_hotplug_lock waiting
>>>> writers + cpuset operations). And the two work items must meet in same
>>>> worker's processing hence probability is low (zero?) with less than
>>>> WQ_DFL_ACTIVE items.
>>
>> Right, I'm on the same page. Should we document then somewhere that
>> the cgroup mutex can't be locked from a system wq context?
>>
>> I think thus will also make the Fixes tag more meaningful.
> 
> I think that's completely fine. What's not fine is saturating system_wq.
> Anything which creates a large number of concurrent work items should be
> using its own workqueue. If anything, workqueue needs to add a warning for
> saturation conditions and who are the offenders.
> 
> Thanks.
> 

I will add a patch do document that.
Should we modify WQ_DFL_ACTIVE(256 now)? Maybe 1024 is acceptable?

Best regards,
Ridong