lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a9472e2f-d4a2-484a-b9a9-63c317a2de82@intel.com>
Date: Tue, 14 Oct 2025 13:57:02 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Babu Moger <bmoger@....com>, <babu.moger@....com>, <tony.luck@...el.com>,
	<Dave.Martin@....com>, <james.morse@....com>, <dave.hansen@...ux.intel.com>,
	<bp@...en8.de>
CC: <kas@...nel.org>, <rick.p.edgecombe@...el.com>,
	<linux-kernel@...r.kernel.org>, <x86@...nel.org>,
	<linux-coco@...ts.linux.dev>, <kvm@...r.kernel.org>
Subject: Re: [PATCH] fs/resctrl: Fix MBM events being unconditionally enabled
 in mbm_event mode

Hi Babu,

On 10/14/25 10:43 AM, Babu Moger wrote:
> On 10/14/25 12:38, Babu Moger wrote:
>> On 10/14/25 11:24, Reinette Chatre wrote:
>>> On 10/7/25 7:38 PM, Reinette Chatre wrote:
>>>> On 10/7/25 10:36 AM, Babu Moger wrote:
>>>>> On 10/6/25 20:23, Reinette Chatre wrote:
>>>>>> On 10/6/25 1:38 PM, Moger, Babu wrote:
>>>>>>> On 10/6/25 12:56, Reinette Chatre wrote:
>>>>>>>> On 9/30/25 1:26 PM, Babu Moger wrote:

...

>>>>>>>> But wait ... I think there may be a bigger problem when considering systems
>>>>>>>> that support ABMC but not X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL.
>>>>>>>> Shouldn't resctrl prevent such a system from switching to "default"
>>>>>>>> mbm_assign_mode? Otherwise resctrl will happily let such a system switch
>>>>>>>> to default mode and when user attempts to read an event file resctrl will
>>>>>>>> attempt to read it via MSRs that are not supported.
>>>>>>>> Looks like ABMC may need something similar to CONFIG_RESCTRL_ASSIGN_FIXED
>>>>>>>> to handle this case in show() while preventing user space from switching to
>>>>>>>> "default" mode on write()?
>>>>>>> This may not be an issue right now. When X86_FEATURE_CQM_MBM_TOTAL and
>>>>>>> X86_FEATURE_CQM_MBM_LOCAL are not supported then mon_data files of these
>>>>>>> events are not created.
>>>>>> By "right now" I assume you mean the current implementation? I think your statement
>>>>>> assumes that no CPUs come or go after resctrl_mon_resource_init() enables the MBM events?
>>>>>> Current implementation will enable MBM events if ABMC is supported. When the
>>>>>> first CPU of a domain comes online after that then resctrl will create the mon_data
>>>>>> files. These files will remain if a user then switches to default mode and if
>>>>>> the user then attempts to read one of these counters then I expect problems.
>>>>> Yes. It will be a problem in the that case.
>>>> Thinking about this more the issue is not about the mon_data files being created since
>>>> they are only created if resctrl is mounted and resctrl_mon_resource_init() is run
>>>> before creating the mountpoint. From what I can tell current MBM events supported by
>>>> ABMC will be enabled at the time resctrl can be mounted so if X86_FEATURE_CQM_MBM_TOTAL
>>>> and X86_FEATURE_CQM_MBM_LOCAL are not supported but ABMC is then I believe the
>>>> mon_data files will be created.
>>>>
>>>> There is a problem with the actual domain creation during resctrl initialization
>>>> where the MBM state data structures are created and depend on the events being
>>>> enabled then.
>>>> resctrl assumes that if an event is enabled then that event's associated
>>>> rdt_mon_domain::mbm_states and rdt_hw_mon_domain::arch_mbm_states exist and if
>>>> those data structures are created (or not created) during CPU online and MBM
>>>> event comes online later then there will be invalid memory accesses.
>>>>
>>>> The conclusion is the same though ... the events need to be initialized during
>>>> resctrl initialization as you note above.
>>>>
>>>>> I am not clear on using config option you mentioned above.
>>>> This is more about what is accomplished by the config option than whether it is
>>>> a config option that controls the flow. More below but I believe there may be
>>>> scenarios where only mbm_event is supported and in that case I expect, even on AMD,
>>>> it may be possible that there is no supported "default" mode and thus:
>>>>   # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>>>>    [mbm_event]
>>>>
>>>>> What about using the check resctrl_is_mon_event_enabled() in
>>>>>
>>>>> resctrl_mbm_assign_mode_show() and resctrl_mbm_assign_mode_write() ?
>>>>>
>>>> Trying to think through how to support a system that can switch between default
>>>> and mbm_event mode I see a couple of things to consider. This is as I am thinking
>>>> through the flows without able to experiment. I think it may help if you could sanity
>>>> check this with perhaps a few experiments to considering the flows yourself to see where
>>>> I am missing things.
>>>>
>>>> When we are clear on the flows to support and how to interact with user space it will
>>>> be easier to start exchanging code.
>>>>
>>>> a) MBM state data structures
>>>>     As mentioned above, rdt_mon_domain::mbm_states and rdt_hw_mon_domain::arch_mbm_states
>>>>     are created during CPU online based on MBM event enabled state. During runtime
>>>>     an enabled MBM event is assumed to have state.
>>>>     To me this implies that any possible MBM event should be enabled during early
>>>>     initialization.
>>>>     A consequence is that any possible MBM event will have its associated event file
>>>>     created even if the active mode of the time cannot support it. (I do not think
>>>>     we want to have event files come and go).
>>>> b) Switching between modes.
>>>>     From what I can tell switching mode is always allowed as long as system supports
>>>>     assignable counters and that may not be correct. Consider a system that supports
>>>>     ABMC but does not support X86_FEATURE_CQM_MBM_TOTAL and/or X86_FEATURE_CQM_MBM_LOCAL ...
>>>>     should it be allowed to switch to "default" mode? At this time I believe this is allowed
>>>>     yet this is an unusable state (as far as MBM goes) and I expect any attempt at reading
>>>>     an event file will result in invalid MSR access?
>>>>     Complexity increases if there is a mismatch in supported events, for example if mbm_event
>>>>     mode supports total and local but default mode only supports one. Should it be allowed
>>>>     to switch modes? If so, user can then still read from both files, the check whether assignable
>>>>     counters is enabled will fail and resctrl will attempt to read both via the counter MSRs,
>>>>     even an unsupported event (continued below).
>>>> c) Read of event file
>>>>     A user can read from event file any time even if active mode (default or mbm_event) does
>>>>     not support it. If mbm_event mode is enabled then resctrl will attempt to use counters,
>>>>     if default mode is enabled then resctrl will attempt to use MSRs.
>>>>     This currently entirely depends on whether mbm_event mode is enabled or not.
>>>>     Perhaps we should add checks here to prevent user from reading an event if the
>>>>     active mode does not support it? Alternatively prevent user from switching to a mode
>>>>     that cannot be supported.
>>>>
>>>> Look forward to how you view things and thoughts on how user may expect to interact with these
>>>> features.
>>
>>
>> Yea.  Taken note of all your points. Sorry for the Iate response. I was investigating on how to fix in a proper way.
>>
>>
>>> I am concerned about this issue. The original changelog only mentions that events are enabled when
>>> they should not be but it looks to me that there is a more serious issue if the user then attempts
>>> to read from such an event. Have you tried the scenario when a user boots with the parameters
>>> mentioned in changelog (rdt=!mbmtotal,!mbmlocal) and then attempts to read one of these events?
>>> Reading from the event will attempt to access its architectural state but from what I can tell
>>> that will not be allocated since the events are not enabled at the time of the allocation.
>>
>>
>> Yes. I saw the issues. It fails to mount in my case with panic trace.

(Just to ensure that there is not anything else going on) Could you please confirm if the panic is from
mon_add_all_files()->mon_event_read()->mon_event_count()->__mon_event_count()->resctrl_arch_reset_rmid()
that creates the MBM event files during mount and then does the initial read of RMID to determine the
starting count? 


>>
>>
>>>
>>> This needs to be fixed during this cycle. A week has passed since my previous message so I do not
>>
>>
>> Yes. I understand your concern.
>>
>>
>>> think that it will be possible to create a full featured solution that keeps X86_FEATURE_ABMC
>>> and X86_FEATURE_CQM_MBM_TOTAL/X86_FEATURE_CQM_MBM_LOCAL independent.
>>
>>
>> Agree.
>>
>>
>>>
>>> What do you think of something like below that builds on your original change and additionally
>>> enforces dependency between these features to support the resctrl assumptions? From what I understand
>>> this is ok for current AMD hardware? A not-as-urgent follow-up can make these features independent
>>> again?
>>
>>
>> Yes. I tested it. Works fine.  It defaults to "default" mode if both the events(local and total) are disabled in kernel parameter. That is expected.

Thank you very much for considering it and trying it out. Could you please also check if it
behaves sanely when just one of the MBM events is enabled? For example by just booting with
"rdt=!mbmtotal" or "rdt=!mbmlocal". Only one event's file should be created while it should
still be possible to switch between default and mbm_event mode, event reads from the event
file working as expected in both modes.


>>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>>> index c8945610d455..fd42fe7b2fdc 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>>> @@ -452,7 +452,16 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>>>           r->mon.mbm_cfg_mask = ecx & MAX_EVT_CONFIG_BITS;
>>>       }
>>>   -    if (rdt_cpu_has(X86_FEATURE_ABMC)) {
>>> +    /*
>>> +     * resctrl assumes a system that supports assignable counters can
>>> +     * switch to "default" mode. Ensure that there is a "default" mode
>>> +     * to switch to. This enforces a dependency between the independent
>>> +     * X86_FEATURE_ABMC and X86_FEATURE_CQM_MBM_TOTAL/X86_FEATURE_CQM_MBM_LOCAL
>>> +     * hardware features.
>>> +     */
>>> +    if (rdt_cpu_has(X86_FEATURE_ABMC) &&
>>> +        (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL) ||
>>> +         rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL))) {
>>>           r->mon.mbm_cntr_assignable = true;
>>>           cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
>>>           r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
>>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>>> index 4076336fbba6..572a9925bd6c 100644
>>> --- a/fs/resctrl/monitor.c
>>> +++ b/fs/resctrl/monitor.c
>>> @@ -1782,15 +1782,13 @@ int resctrl_mon_resource_init(void)
>>>           mba_mbps_default_event = QOS_L3_MBM_TOTAL_EVENT_ID;
>>>         if (r->mon.mbm_cntr_assignable) {
>>> -        if (!resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
>>> - resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID);
>>> -        if (!resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
>>> - resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID);
>>> -        mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID].evt_cfg = r->mon.mbm_cfg_mask;
>>> -        mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID].evt_cfg = r->mon.mbm_cfg_mask &
>>> -                                   (READS_TO_LOCAL_MEM |
>>> -                                    READS_TO_LOCAL_S_MEM |
>>> - NON_TEMP_WRITE_TO_LOCAL_MEM);
>>> +        if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
>>> +            mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID].evt_cfg = r->mon.mbm_cfg_mask;
>>> +        if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
>>> +            mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID].evt_cfg = r->mon.mbm_cfg_mask &
>>> +                                       (READS_TO_LOCAL_MEM |
>>> +                                        READS_TO_LOCAL_S_MEM |
>>> + NON_TEMP_WRITE_TO_LOCAL_MEM);
>>>           r->mon.mbm_assign_on_mkdir = true;
>>>           resctrl_file_fflags_init("num_mbm_cntrs",
>>>                        RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
>>>
>>>
>>>
>>>
> 
> I can send the official patch if you are ok to go ahead with the patch.

I am ok to go ahead with this patch. Please do rewrite the subject and changelog to highlight the
severity. I'd recommend that the changelog be something like:


	The following BUG/PANIC/splat(?) is encountered on mount of resctrl fs after booting
	a system that has X86_FEATURE_ABMC with the "rdt=!mbmtotal,!mbmlocal" kernel parameters:

	<trimmed backtrace>

	<problem description>

	<description of fix that also mentions it adds dependency where there is none and why this
	 is ok (for now?)>

> 
> Let me know if I can add Signoff from you or you can respond after it is reviewed.

You could add below tags or we can just do the usual review. Either works for me. Let me know if
you would like more collaboration on the changelog.

Co-developed-by: Reinette Chatre <reinette.chatre@...el.com>
Signed-off-by: Reinette Chatre <reinette.chatre@...el.com>

Reinette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ