[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6bdac218-a18a-4cb5-b10e-c369d90b502c@huaweicloud.com>
Date: Fri, 3 Jan 2025 10:22:33 +0800
From: Chen Ridong <chenridong@...weicloud.com>
To: Michal Koutný <mkoutny@...e.com>
Cc: tj@...nel.org, hannes@...xchg.org, longman@...hat.com,
roman.gushchin@...ux.dev, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org, bpf@...r.kernel.org, chenridong@...wei.com,
wangweiyang2@...wei.com
Subject: Re: [PATCH v1] cgroup/cpuset: remove kernfs active break
On 2025/1/3 0:02, Michal Koutný wrote:
> On Fri, Dec 20, 2024 at 01:31:06AM +0000, Chen Ridong <chenridong@...weicloud.com> wrote:
>> RIP: 0010:kernfs_should_drain_open_files+0x1a1/0x1b0
>
> I assume it's this
> WARN_ON_ONCE(atomic_read(&kn->active) != KN_DEACTIVATED_BIAS);
>
Right.
>> It can be explained by:
>> rmdir echo 1 > cpuset.cpus
>> kernfs_fop_write_iter // active=0
>> cgroup_rm_file
>> kernfs_remove_by_name_ns kernfs_get_active // active=1
>> __kernfs_remove // active=0x80000002
>> kernfs_drain cpuset_write_resmask
>> wait_event
>> //waiting (active == 0x80000001)
>> kernfs_break_active_protection
>> // active = 0x80000001
>> // continue
>> kernfs_unbreak_active_protection
>> // active = 0x80000002
>> ...
>> kernfs_should_drain_open_files
>> // warning occurs
>> kernfs_put_active
>
> Thanks for this breakdown.
>
>> To avoid deadlock. the commit 76bb5ab8f6e3 ("cpuset:
>> break kernfs active protection in cpuset_write_resmask()") added
>> 'kernfs_break_active_protection' in the cpuset_write_resmask. This could
>> lead to this warning.
>
> The deadlock cycle included cpuset_hotplug_work and since that was
> removed in the said commit, there shouldn't be same deadlock possible.
>
> Ridong, have you run your patch with CONFIG_LOCKDEP to check that
> eventuality?
>
Yes, I tested.
>> After the commit 2125c0034c5d ("cgroup/cpuset: Make cpuset hotplug
>> processing synchronous"), the cpuset_write_resmask no longer needs to
>> wait the hotplug to finish, which means that cpuset_write_resmask won't
>> grab the cgroup_mutex. So the deadlock doesn't exist anymore. Therefore,
>> remove kernfs_break_active_protection operation in the
>> 'cpuset_write_resmask'
>>
>> Fixes: 76bb5ab8f6e3 ("cpuset: break kernfs active protection in cpuset_write_resmask()")
>
> This commit alone isn't sufficient to cause the warning you observed,
> right?
I think the commit 76bb5ab8f6e3 ("cpuset: break kernfs active protection
in cpuset_write_resmask()") is causing the warning I observed.
This warning was observed when removing a cpuset cgroup and writing to
cpuset.cpus concurrently. Unlike the cgroup_kn_lock_live functions,
which break active protection and grab the cgroup_mutex immediately to
avoid concurrent removal, writing to 'cpuset_write_resmask' cannot avoid
concurrent removal of the cgroup directory. Therefore, this could cause
the warning.
> As I read kernfs_break_active_protection() comment, I don't see cpuset
> code violating its conditions:
> a) it's broken/unbroken from withing a kernfs file operation handler,
> b) it pins the needed struct cpuset independently of kernfs_node (it's
> ok to be removed)
>
I am not sure if it is safe to call
kernfs_unbreak_active_protection(atomic_inc(&kn->active)); after the
'kn' has been removed. I don't know much about this. However, I have not
seen any Use-After-Free (UAF) issues so far.
I would be grateful if you could provide more information.
Best regards
Ridong
> All in all -- I think the particular break/unbreak pair is unncecessary
> nowadays and the warning implemented with hiding/showing kernfs files
> didn't take temporary breakage into account (only based on quick
> searching and vague memories).
>
> Thanks,
> Michal
Powered by blists - more mailing lists