[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0d290ca1-20cf-4165-aa4e-62da582a5461@arm.com>
Date: Fri, 7 Mar 2025 17:54:01 +0000
From: James Morse <james.morse@....com>
To: Reinette Chatre <reinette.chatre@...el.com>, x86@...nel.org,
linux-kernel@...r.kernel.org
Cc: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>, H Peter Anvin <hpa@...or.com>,
Babu Moger <Babu.Moger@....com>, shameerali.kolothum.thodi@...wei.com,
D Scott Phillips OS <scott@...amperecomputing.com>,
carl@...amperecomputing.com, lcherian@...vell.com,
bobo.shaobowang@...wei.com, tan.shaopeng@...itsu.com,
baolin.wang@...ux.alibaba.com, Jamie Iles <quic_jiles@...cinc.com>,
Xin Hao <xhao@...ux.alibaba.com>, peternewman@...gle.com,
dfustini@...libre.com, amitsinght@...vell.com,
David Hildenbrand <david@...hat.com>, Rex Nie <rex.nie@...uarmicro.com>,
Dave Martin <dave.martin@....com>, Koba Ko <kobak@...dia.com>,
Shanker Donthineni <sdonthineni@...dia.com>, fenghuay@...dia.com,
Shaopeng Tan <tan.shaopeng@...fujitsu.com>, Tony Luck <tony.luck@...el.com>
Subject: Re: [PATCH v7 33/49] x86/resctrl: resctrl_exit() teardown resctrl but
leave the mount point
Hi Reinette,
On 07/03/2025 04:45, Reinette Chatre wrote:
> On 2/28/25 11:58 AM, James Morse wrote:
>> resctrl_exit() was intended for use when the 'resctrl' module was unloaded.
>> resctrl can't be built as a module, and the kernfs helpers are not exported
>> so this is unlikely to change. MPAM has an error interrupt which indicates
>> the MPAM driver has gone haywire. Should this occur tasks could run with
>> the wrong control values, leading to bad performance for important tasks.
>> The MPAM driver needs a way to tell resctrl that no further configuration
>> should be attempted.
>>
>> Using resctrl_exit() for this leaves the system in a funny state as
>> resctrl is still mounted, but cannot be un-mounted because the sysfs
>> directory that is typically used has been removed. Dave Martin suggests
>> this may cause systemd trouble in the future as not all filesystems
>> can be unmounted.
>>
>> Add calls to remove all the files and directories in resctrl, and
>> remove the sysfs_remove_mount_point() call that leaves the system
>> in a funny state. When triggered, this causes all the resctrl files
>> to disappear. resctrl can be unmounted, but not mounted again.
> (copying v6 discussion here)
> On 3/6/25 11:28 AM, James Morse wrote:
>> On 01/03/2025 02:35, Reinette Chatre wrote:
>>> On 2/28/25 11:54 AM, James Morse wrote:
>>>> On 20/02/2025 04:42, Reinette Chatre wrote:
>
>>>>> It is difficult for me to follow the kernfs reference counting required
>>>>> to make this work. Specifically, the root kn is "destroyed" here but it
>>>>> is required to stick around until unmount when the rest of the files
>>>>> are removed.
>>>>
>>>> This drops resctrl's reference to all of the files, which would make the files disappear.
>>>> unmount is what calls kernfs_kill_sb(), which gets rid of the root of the filesystem.
>>>
>>> My concern is mostly with the kernfs_remove() calls in the rdt_kill_sb()->rmdir_all_sub()
>>> flow. For example:
>>> kernfs_remove(kn_info);
>>> kernfs_remove(kn_mongrp);
>>> kernfs_remove(kn_mondata);
>>>
>>> As I understand the above require the destroyed root to still be around.
>> Right - because rdt_get_tree() has these global pointers into the hierarchy, but doesn't
>> take a reference. rmdir_all_sub() relies on always being called before
>> rdtgroup_destroy_root().
> Is this a known issue then?
Its just a subtle thing that resctrl was relying on, and I didn't spot. Thanks for
pointing it out!
> Since I am not able to use your test I created something new
> after thinking there would be no response to my comment and indeed on unmount:
> [ 293.707228] BUG: KASAN: slab-use-after-free in kernfs_remove+0x87/0xa0
> [ 293.714718] Read of size 8 at addr ff11000309d88f30 by task umount/3793
>> The point hack would be for rdtgroup_destroy_root() to NULL out those global pointers, (I
>> note they are left dangling) - that would make a subsequent call to rmdir_all_sub() harmless.
>>
>> A better fix would be to pull out all the filesystem relevant parts from rdt_kill_sb(),
>> make that safe for multiple calls and get resctrl_exit() to call that.
>> A call to rdt_kill_sb() after resctrl_exit() would just cleanup the super-block.
>> This will leave things in a more predictable state.
> Why just the filesystem relevant parts?
The stated aim is to prevent any further configuration of the hardware.
We can change that to 'unmount as much as possible'. I didn't think of it like that as I
couldn't find an in-kernel way of triggering a umount(). (and the mount may be copied into
numerous namespaces)
> Although, you also state "resctrl_exit() would just
> cleanup the super-block" that sounds like you are thinking about pulling out all reset work.
> This sounds reasonable to me. It really feels more appropriate to do proper cleanup and
> not just wipe the root while leaving everything else underneath it.
After this discussion, my new proposal is to do this:
-------%<-------
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 0d74a6d98dba..cee4604a59c0 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -3063,6 +3063,21 @@ static void rmdir_all_sub(void)
kernfs_remove(kn_mondata);
}
+static void resctrl_fs_teardown(void)
+{
+ lockdep_assert_held(&rdtgroup_mutex);
+
+ /* Cleared by rdtgroup_destroy_root() */
+ if (!rdtgroup_default.kn)
+ return;
+
+ rmdir_all_sub();
+ rdt_pseudo_lock_release();
+ rdtgroup_default.mode = RDT_MODE_SHAREABLE;
+ schemata_list_destroy();
+ closid_exit();
+ rdtgroup_destroy_root();
+}
+
static void rdt_kill_sb(struct super_block *sb)
{
struct rdt_resource *r;
@@ -3076,11 +3091,8 @@ static void rdt_kill_sb(struct super_block *sb)
for_each_alloc_capable_rdt_resource(r)
resctrl_arch_reset_all_ctrls(r);
- rmdir_all_sub();
- rdt_pseudo_lock_release();
- rdtgroup_default.mode = RDT_MODE_SHAREABLE;
- schemata_list_destroy();
- rdtgroup_destroy_root();
+ resctrl_fs_teardown();
+
if (resctrl_arch_alloc_capable())
resctrl_arch_disable_alloc();
if (resctrl_arch_mon_capable())
resctrl_arch_disable_mon();
- closid_exit();
resctrl_mounted = false;
kernfs_kill_sb(sb);
mutex_unlock(&rdtgroup_mutex);
-------%<-------
and call resctrl_fs_teardown() from resctrl_exit().
This leaves a bunch of resctrl_arch_ calls behind, the reason is its the arch code that
calls resctrl_exit(), so it probably doesn't need to be told to disable things. For MPAM,
the work of resctrl_arch_reset_all_ctrls() will already have been done - and all these
calls will fail because the MPAM driver is blocking further access to the hardware.
Thanks,
James
Powered by blists - more mailing lists