[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250815100213.4599-1-hdanton@sina.com>
Date: Fri, 15 Aug 2025 18:02:07 +0800
From: Hillf Danton <hdanton@...a.com>
To: Chen Ridong <chenridong@...weicloud.com>
Cc: Michal Koutny <mkoutny@...e.com>,
tj@...nel.org,
cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org,
lujialin4@...wei.com,
chenridong@...wei.com,
gaoyingjie@...ontech.com
Subject: Re: [PATCH v2 -next] cgroup: remove offline draining in root destruction to avoid hung_tasks
On Fri, 15 Aug 2025 15:29:56 +0800 Chen Ridong wrote:
>On 2025/8/15 10:40, Hillf Danton wrote:
>> On Fri, Jul 25, 2025 at 09:42:05AM +0800, Chen Ridong <chenridong@...weicloud.com> wrote:
>>>> On Tue, Jul 22, 2025 at 11:27:33AM +0000, Chen Ridong <chenridong@...weicloud.com> wrote:
>>>>> CPU0 CPU1
>>>>> mount perf_event umount net_prio
>>>>> cgroup1_get_tree cgroup_kill_sb
>>>>> rebind_subsystems // root destruction enqueues
>>>>> // cgroup_destroy_wq
>>>>> // kill all perf_event css
>>>>> // one perf_event css A is dying
>>>>> // css A offline enqueues cgroup_destroy_wq
>>>>> // root destruction will be executed first
>>>>> css_free_rwork_fn
>>>>> cgroup_destroy_root
>>>>> cgroup_lock_and_drain_offline
>>>>> // some perf descendants are dying
>>>>> // cgroup_destroy_wq max_active = 1
>>>>> // waiting for css A to die
>>>>>
>>>>> Problem scenario:
>>>>> 1. CPU0 mounts perf_event (rebind_subsystems)
>>>>> 2. CPU1 unmounts net_prio (cgroup_kill_sb), queuing root destruction work
>>>>> 3. A dying perf_event CSS gets queued for offline after root destruction
>>>>> 4. Root destruction waits for offline completion, but offline work is
>>>>> blocked behind root destruction in cgroup_destroy_wq (max_active=1)
>>>>
>>>> What's concerning me is why umount of net_prio hierarhy waits for
>>>> draining of the default hierachy? (Where you then run into conflict with
>>>> perf_event that's implicit_on_dfl.)
>>>>
>> /*
>> * cgroup destruction makes heavy use of work items and there can be a lot
>> * of concurrent destructions. Use a separate workqueue so that cgroup
>> * destruction work items don't end up filling up max_active of system_wq
>> * which may lead to deadlock.
>> */
>>
>> If task hung could be reliably reproduced, it is right time to cut
>> max_active off for cgroup_destroy_wq according to its comment.
>
>Hi Danton,
>
>Thank you for your feedback.
>
>While modifying max_active could be a viable solution, I’m unsure whether it might introduce other
>side effects. Instead, I’ve proposed an alternative approach in v3 of the patch, which I believe
>addresses the issue more comprehensively.
>
Given your reproducer [1], it is simple to test with max_active cut.
I do not think v3 is a correct fix frankly because it leaves the root cause
intact. Nor is it cgroup specific even given high concurrency in destruction.
[1] https://lore.kernel.org/lkml/39e05402-40c7-4631-a87b-8e3747ceddc6@huaweicloud.com/
Powered by blists - more mailing lists