linux-kernel - Re: [PATCH v2 -next] cgroup: remove offline draining in root destruction to avoid hung

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250815100213.4599-1-hdanton@sina.com>
Date: Fri, 15 Aug 2025 18:02:07 +0800
From: Hillf Danton <hdanton@...a.com>
To: Chen Ridong <chenridong@...weicloud.com>
Cc: Michal Koutny <mkoutny@...e.com>,
	tj@...nel.org,
	cgroups@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	lujialin4@...wei.com,
	chenridong@...wei.com,
	gaoyingjie@...ontech.com
Subject: Re: [PATCH v2 -next] cgroup: remove offline draining in root destruction to avoid hung_tasks

On Fri, 15 Aug 2025 15:29:56 +0800 Chen Ridong wrote:
>On 2025/8/15 10:40, Hillf Danton wrote:
>> On Fri, Jul 25, 2025 at 09:42:05AM +0800, Chen Ridong <chenridong@...weicloud.com> wrote:
>>>> On Tue, Jul 22, 2025 at 11:27:33AM +0000, Chen Ridong <chenridong@...weicloud.com> wrote:
>>>>> CPU0                            CPU1
>>>>> mount perf_event                umount net_prio
>>>>> cgroup1_get_tree                cgroup_kill_sb
>>>>> rebind_subsystems               // root destruction enqueues
>>>>> 				// cgroup_destroy_wq
>>>>> // kill all perf_event css
>>>>>                                 // one perf_event css A is dying
>>>>>                                 // css A offline enqueues cgroup_destroy_wq
>>>>>                                 // root destruction will be executed first
>>>>>                                 css_free_rwork_fn
>>>>>                                 cgroup_destroy_root
>>>>>                                 cgroup_lock_and_drain_offline
>>>>>                                 // some perf descendants are dying
>>>>>                                 // cgroup_destroy_wq max_active = 1
>>>>>                                 // waiting for css A to die
>>>>>
>>>>> Problem scenario:
>>>>> 1. CPU0 mounts perf_event (rebind_subsystems)
>>>>> 2. CPU1 unmounts net_prio (cgroup_kill_sb), queuing root destruction work
>>>>> 3. A dying perf_event CSS gets queued for offline after root destruction
>>>>> 4. Root destruction waits for offline completion, but offline work is
>>>>>    blocked behind root destruction in cgroup_destroy_wq (max_active=1)
>>>>
>>>> What's concerning me is why umount of net_prio hierarhy waits for
>>>> draining of the default hierachy? (Where you then run into conflict with
>>>> perf_event that's implicit_on_dfl.)
>>>>
>> /*
>>  * cgroup destruction makes heavy use of work items and there can be a lot
>>  * of concurrent destructions.  Use a separate workqueue so that cgroup
>>  * destruction work items don't end up filling up max_active of system_wq
>>  * which may lead to deadlock.
>>  */
>> 
>> If task hung could be reliably reproduced, it is right time to cut
>> max_active off for cgroup_destroy_wq according to its comment.
>
>Hi Danton,
>
>Thank you for your feedback.
>
>While modifying max_active could be a viable solution, I’m unsure whether it might introduce other
>side effects. Instead, I’ve proposed an alternative approach in v3 of the patch, which I believe
>addresses the issue more comprehensively.
>
Given your reproducer [1], it is simple to test with max_active cut.

I do not think v3 is a correct fix frankly because it leaves the root cause
intact. Nor is it cgroup specific even given high concurrency in destruction.

[1] https://lore.kernel.org/lkml/39e05402-40c7-4631-a87b-8e3747ceddc6@huaweicloud.com/