linux-kernel - Re: [EXTERNAL] Re: [PATCH v1 28/31] x86/resctrl: Drop __init/_

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <9b1f537d-4965-472b-bc85-cfa22c87c58e@arm.com>
Date: Fri, 14 Jun 2024 14:59:22 +0100
From: James Morse <james.morse@....com>
To: Amit Singh Tomar <amitsinght@...vell.com>,
 Dave Martin <Dave.Martin@....com>,
 Reinette Chatre <reinette.chatre@...el.com>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org,
 Fenghua Yu <fenghua.yu@...el.com>, Thomas Gleixner <tglx@...utronix.de>,
 Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
 H Peter Anvin <hpa@...or.com>, Babu Moger <Babu.Moger@....com>,
 shameerali.kolothum.thodi@...wei.com,
 D Scott Phillips OS <scott@...amperecomputing.com>,
 carl@...amperecomputing.com, lcherian@...vell.com,
 bobo.shaobowang@...wei.com, tan.shaopeng@...itsu.com,
 baolin.wang@...ux.alibaba.com, Jamie Iles <quic_jiles@...cinc.com>,
 Xin Hao <xhao@...ux.alibaba.com>, peternewman@...gle.com,
 dfustini@...libre.com, David Hildenbrand <david@...hat.com>,
 Rex Nie <rex.nie@...uarmicro.com>
Subject: Re: [EXTERNAL] Re: [PATCH v1 28/31] x86/resctrl: Drop __init/__exit
 on assorted symbols

Hi Amit, Reinette,

On 11/04/2024 16:51, Amit Singh Tomar wrote:
>> On Mon, Apr 08, 2024 at 08:32:36PM -0700, Reinette Chatre wrote:
>>> On 3/21/2024 9:51 AM, James Morse wrote:
>>>> Because ARM's MPAM controls are probed using MMIO, resctrl can't be
>>>> initialised until enough CPUs are online to have determined the
>>>> system-wide supported num_closid. Arm64 also supports 'late onlined
>>>> secondaries', where only a subset of CPUs are online during boot.
>>>>
>>>> These two combine to mean the MPAM driver may not be able to initialise
>>>> resctrl until user-space has brought 'enough' CPUs online.
>>>>
>>>> To allow MPAM to initialise resctrl after __init text has been free'd,
>>>> remove all the __init markings from resctrl.
>>>>
>>>> The existing __exit markings cause these functions to be removed by the
>>>> linker as it has never been possible to build resctrl as a module. MPAM
>>>> has an error interrupt which causes the driver to reset and disable
>>>> itself. Remove the __exit markings to allow the MPAM driver to tear down
>>>> resctrl when an error occurs.
>>>
>>> Obviously for the reasons you state this code has never been exercised.
>>> Were you able to test this error interrupt flow yet?

>> I think this will have to wait for James to respond.
>>
>> There is code to tear down resctrl in response to an MPAM error interrupt,
>> but I don't know how it has been exercised so far (if at all).

Previously I saw one or two kernfs structures left behind. (to discover this you had to
leave a shell with its CWD in the filesystem), but it looks like those issues have been
solved.

Dave points out that resctrl_exit() removing the sysfs mount point means the filesystem
can't be umount()ed. Systemd doesn't seem to care today, but might choke on this in the
future.

I think the right thing to do here is get resctrl_exit() to call rdtgroup_destroy_root(),
and drop sysfs_remove_mount_point(). This creates a bit of asymmetry, but if resctrl were
a module the mount-point stuff would be done in module init/exit - only we don't have a
module to unload, so the asymmetry is to be expected. I don't think its worth adding new
__exit text that we know will never be used for the sake of symmetry.

With this change, triggering the interrupt makes all the files under resctrl disappear, I
can then umount() the filesystem, but not re-mount it.

The aim here is for the arch code to be able to say "this is broken, I can't support
resctrl" with minimum changes to the existing code.

There are a couple of vanishingly unlikely corner cases that need tightening up: e.g. the
rmid_ptrs[] array disappears, a syscall could get blocked on the rdtgroup_mutex while the
teardown happens, once it gains the lock it discoverers a surprise NULL pointer.

Fixing these can wait until after the code is moved as these things can't happen on x86.
(patches are in the 'extra's branch of the mpam tree:
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/log/?h=mpam/snapshot%2bextras/v6.10-rc1
)

> We are managed to test the MPAM error interrupt (on the platform that supports MPAM
> interrupts on software errors). For instance programming
> more resource control groups (part IDs) than available, and It appears to correctly remove
> the "resctrl" mount point (though mount command still shows resctrl on /sys/fs/resctrl
> type resctrl (rw,relatime)
> ), but
> 
> # mount -t resctrl resctrl /sys/fs/resctrl
> mount: /sys/fs/resctrl: mount point does not exist.
> 
> Additionally, a question regarding this, Is a complete system restart necessary to regain
> the mount?

It is - but you are likely to hit the same software bug again. The story here is about
keeping the machine running without penalising the wrong task. I think its acceptable for
programs driving resctrl to go wrong, provided the workload doesn't run with the wrong
configuration.

Thanks,

James