[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dd1418f9-93d0-45c9-bcc2-d67f48d050f6@huaweicloud.com>
Date: Fri, 15 Aug 2025 15:29:56 +0800
From: Chen Ridong <chenridong@...weicloud.com>
To: Hillf Danton <hdanton@...a.com>, Michal Koutný
<mkoutny@...e.com>
Cc: tj@...nel.org, cgroups@...r.kernel.org, linux-kernel@...r.kernel.org,
lujialin4@...wei.com, chenridong@...wei.com, gaoyingjie@...ontech.com
Subject: Re: [PATCH v2 -next] cgroup: remove offline draining in root
destruction to avoid hung_tasks
On 2025/8/15 10:40, Hillf Danton wrote:
> On Fri, Jul 25, 2025 at 09:42:05AM +0800, Chen Ridong <chenridong@...weicloud.com> wrote:
>>> On Tue, Jul 22, 2025 at 11:27:33AM +0000, Chen Ridong <chenridong@...weicloud.com> wrote:
>>>> CPU0 CPU1
>>>> mount perf_event umount net_prio
>>>> cgroup1_get_tree cgroup_kill_sb
>>>> rebind_subsystems // root destruction enqueues
>>>> // cgroup_destroy_wq
>>>> // kill all perf_event css
>>>> // one perf_event css A is dying
>>>> // css A offline enqueues cgroup_destroy_wq
>>>> // root destruction will be executed first
>>>> css_free_rwork_fn
>>>> cgroup_destroy_root
>>>> cgroup_lock_and_drain_offline
>>>> // some perf descendants are dying
>>>> // cgroup_destroy_wq max_active = 1
>>>> // waiting for css A to die
>>>>
>>>> Problem scenario:
>>>> 1. CPU0 mounts perf_event (rebind_subsystems)
>>>> 2. CPU1 unmounts net_prio (cgroup_kill_sb), queuing root destruction work
>>>> 3. A dying perf_event CSS gets queued for offline after root destruction
>>>> 4. Root destruction waits for offline completion, but offline work is
>>>> blocked behind root destruction in cgroup_destroy_wq (max_active=1)
>>>
>>> What's concerning me is why umount of net_prio hierarhy waits for
>>> draining of the default hierachy? (Where you then run into conflict with
>>> perf_event that's implicit_on_dfl.)
>>>
> /*
> * cgroup destruction makes heavy use of work items and there can be a lot
> * of concurrent destructions. Use a separate workqueue so that cgroup
> * destruction work items don't end up filling up max_active of system_wq
> * which may lead to deadlock.
> */
>
> If task hung could be reliably reproduced, it is right time to cut
> max_active off for cgroup_destroy_wq according to its comment.
Hi Danton,
Thank you for your feedback.
While modifying max_active could be a viable solution, I’m unsure whether it might introduce other
side effects. Instead, I’ve proposed an alternative approach in v3 of the patch, which I believe
addresses the issue more comprehensively.
I’d be very grateful if you could take a look and share your thoughts. Your review would be greatly
appreciated!
v3: https://lore.kernel.org/cgroups/20250815070518.1255842-1-chenridong@huaweicloud.com/T/#u
--
Best regards,
Ridong
Powered by blists - more mailing lists