linux-kernel - Re: [PATCH -next] cgroup: Fix AA deadlock caused by cgroup_bpf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <f93d3b5d-f58b-4787-abaf-8b07d37b7302@huawei.com>
Date: Wed, 17 Jul 2024 10:08:11 +0800
From: chenridong <chenridong@...wei.com>
To: Roman Gushchin <roman.gushchin@...ux.dev>
CC: Tejun Heo <tj@...nel.org>, <martin.lau@...ux.dev>, <ast@...nel.org>,
	<daniel@...earbox.net>, <andrii@...nel.org>, <eddyz87@...il.com>,
	<song@...nel.org>, <yonghong.song@...ux.dev>, <john.fastabend@...il.com>,
	<kpsingh@...nel.org>, <sdf@...gle.com>, <haoluo@...gle.com>,
	<jolsa@...nel.org>, <lizefan.x@...edance.com>, <hannes@...xchg.org>,
	<bpf@...r.kernel.org>, <cgroups@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH -next] cgroup: Fix AA deadlock caused by
 cgroup_bpf_release



On 2024/7/16 22:53, Roman Gushchin wrote:
> On Tue, Jul 16, 2024 at 08:14:31PM +0800, chenridong wrote:
>>
>>
>> On 2024/7/12 9:15, chenridong wrote:
>>>
>>>
>>> On 2024/7/12 1:36, Tejun Heo wrote:
>>>> Hello,
>>>>
>>>> On Thu, Jul 11, 2024 at 03:52:34AM +0000, Roman Gushchin wrote:
>>>>>> The max_active of system_wq is WQ_DFL_ACTIVE(256). If all
>>>>>> active works are
>>>>>> cgroup bpf release works, it will block smp_call_on_cpu work
>>>>>> which enque
>>>>>> after cgroup bpf releases. So smp_call_on_cpu holding
>>>>>> cpu_hotplug_lock will
>>>>>> wait for completion, but it can never get a completion
>>>>>> because cgroup bpf
>>>>>> release works can not get cgroup_mutex and will never finish.
>>>>>> However, Placing the cgroup bpf release works on cgroup
>>>>>> destroy will never
>>>>>> block smp_call_on_cpu work, which means loop is broken.
>>>>>> Thus, it can solve
>>>>>> the problem.
>>>>>
>>>>> Tejun,
>>>>>
>>>>> do you have an opinion on this?
>>>>>
>>>>> If there are certain limitations from the cgroup side on what
>>>>> can be done
>>>>> in a generic work context, it would be nice to document (e.g. don't grab
>>>>> cgroup mutex), but I still struggle to understand what exactly is wrong
>>>>> with the blamed commit.
>>>>
>>>> I think the general rule here is more "don't saturate system wqs" rather
>>>> than "don't grab cgroup_mutex from system_wq". system wqs are for misc
>>>> things which shouldn't create a large number of concurrent work items. If
>>>> something is going to generate 256+ concurrent work items, it should
>>>> use its
>>>> own workqueue. We don't know what's in system wqs and can't expect
>>>> its users
>>>> to police specific lock usages.
>>>>
>>> Thank you, Tj. That's exactly what I'm trying to convey. Just like
>>> cgroup, which has its own workqueue and may create a large number of
>>> release works, it is better to place all its related works on its
>>> workqueue rather than on system wqs.
>>>
>>> Regards,
>>> Ridong
>>>
>>>> Another aspect is that the current WQ_DFL_ACTIVE is an arbitrary number I
>>>> came up with close to 15 years ago. Machine size has increased by
>>>> multiple
>>>> times, if not an order of magnitude since then. So, "there can't be a
>>>> reasonable situation where 256 concurrency limit isn't enough" is most
>>>> likely not true anymore and the limits need to be pushed upward.
>>>>
>>>> Thanks.
>>>>
>>>
>> Hello, Tejun, and Roman, is the patch acceptable? Do I need to take any
>> further actions?
>>
> 
> I'm not against merging it. I still find the explanation/commit message
> a bit vague and believe that maybe some changes need to be done on the watchdog
> side to make such lockups impossible. As I understand the two most important
> pieces are the watchdog which tries to run a system work on every cpu while
> holding cpu_hotplug_lock on read and the cpuset controller which tries
> to grab cpu_hotplug_lock on writing.
> 
> It's indeed a tricky problem, so maybe there is no simple and clear explanation.
> 
> Anyway thank you for finding the problem and providing a reproducer!
> 
> Thanks!

Originally, we have tried several methods to address this issue on the 
watchdog side, but they failed to fix the problem. This is the only way 
we have found that can fix it now. Perhaps the commit message could be 
clearer; I will do it in v2.

Hello, Tejun, should i add a commit to modify the WQ_DFL_ACTIVE value? 
Perhaps 1024 is reasonable?

Thanks