[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <e485b38a-183b-42c8-9aed-9c2d939add0b@huaweicloud.com>
Date: Mon, 18 Aug 2025 16:00:08 +0800
From: Chen Ridong <chenridong@...weicloud.com>
To: Baokun Li <libaokun1@...wei.com>
Cc: cgroups@...r.kernel.org, chenridong@...wei.com,
gregkh@...uxfoundation.org, hannes@...xchg.org,
linux-kernel@...r.kernel.org, lujialin4@...wei.com, mkoutny@...e.com,
peterz@...radead.org, tj@...nel.org, zhouchengming@...edance.com,
Yang Erkun <yangerkun@...wei.com>
Subject: Re: [PATCH] kernfs: Fix UAF in PSI polling when open file is released
On 2025/8/16 17:53, Baokun Li wrote:
>> From: Chen Ridong <chenridong@...wei.com>
>>
>> A use-after-free (UAF) vulnerability was identified in the PSI (Pressure
>> Stall Information) monitoring mechanism:
>>
>> BUG: KASAN: slab-use-after-free in psi_trigger_poll+0x3c/0x140
>> Read of size 8 at addr ffff3de3d50bd308 by task systemd/1
>>
>> psi_trigger_poll+0x3c/0x140
>> cgroup_pressure_poll+0x70/0xa0
>> cgroup_file_poll+0x8c/0x100
>> kernfs_fop_poll+0x11c/0x1c0
>> ep_item_poll.isra.0+0x188/0x2c0
>>
>> Allocated by task 1:
>> cgroup_file_open+0x88/0x388
>> kernfs_fop_open+0x73c/0xaf0
>> do_dentry_open+0x5fc/0x1200
>> vfs_open+0xa0/0x3f0
>> do_open+0x7e8/0xd08
>> path_openat+0x2fc/0x6b0
>> do_filp_open+0x174/0x368
>>
>> Freed by task 8462:
>> cgroup_file_release+0x130/0x1f8
>> kernfs_drain_open_files+0x17c/0x440
>> kernfs_drain+0x2dc/0x360
>> kernfs_show+0x1b8/0x288
>> cgroup_file_show+0x150/0x268
>> cgroup_pressure_write+0x1dc/0x340
>> cgroup_file_write+0x274/0x548
>>
>> Reproduction Steps:
>> 1. Open test/cpu.pressure and establish epoll monitoring
>> 2. Disable monitoring: echo 0 > test/cgroup.pressure
>> 3. Re-enable monitoring: echo 1 > test/cgroup.pressure
>>
>> The race condition occurs because:
>> 1. When cgroup.pressure is disabled (echo 0 > cgroup.pressure), it:
>> - Releases PSI triggers via cgroup_file_release()
>> - Frees of->priv through kernfs_drain_open_files()
>> 2. While epoll still holds reference to the file and continues polling
>> 3. Re-enabling (echo 1 > cgroup.pressure) accesses freed of->priv
>>
>> epolling disable/enable cgroup.pressure
>> fd=open(cpu.pressure)
>> while(1)
>> ...
>> epoll_wait
>> kernfs_fop_poll
>> kernfs_get_active = true echo 0 > cgroup.pressure
>> ... cgroup_file_show
>> kernfs_show
>> // inactive kn
>> kernfs_drain_open_files
>> cft->release(of);
>> kfree(ctx);
>> ...
>> kernfs_get_active = false
>> echo 1 > cgroup.pressure
>> kernfs_show
>> kernfs_activate_one(kn);
>> kernfs_fop_poll
>> kernfs_get_active = true
>> cgroup_file_poll
>> psi_trigger_poll
>> // UAF
>> ...
>> end: close(fd)
Thank you, Baokun.
> I think the problem is that kernfs_show() handles enable and disable
> inconsistently. When disable is called, it sets kn->active and then frees
> cgroup_file_ctx and psi_trigger. But when enable is called, it only sets
> kn->active. This mismatch means we can end up accessing the freed
> cgroup_file_ctx and psi_trigger later on.
>
I agree with that.
> A potential solution is to make the lifecycles of cgroup_file_ctx and
> psi_trigger match the struct kernfs_open_file they're associated with.
> Maybe we could just get rid of the kernfs_release_file call in
> kernfs_drain_open_files?
>
Hi, Tj, what do you think about this solution?
> That way, the resources would be safely released only when the file
> descriptor is actually freed. Plus, if cgroup.pressure is re-enabled,
> any open file descriptors would still work as expected.
>
>
> Cheers,
> Baokun
>
--
Best regards,
Ridong
Powered by blists - more mailing lists