lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0e52d4fe-0ff7-415a-babd-acf3c39f9d30@amd.com>
Date: Tue, 14 Oct 2025 12:38:05 -0500
From: Babu Moger <bmoger@....com>
To: Reinette Chatre <reinette.chatre@...el.com>, babu.moger@....com,
 tony.luck@...el.com, Dave.Martin@....com, james.morse@....com,
 dave.hansen@...ux.intel.com, bp@...en8.de
Cc: kas@...nel.org, rick.p.edgecombe@...el.com, linux-kernel@...r.kernel.org,
 x86@...nel.org, linux-coco@...ts.linux.dev, kvm@...r.kernel.org
Subject: Re: [PATCH] fs/resctrl: Fix MBM events being unconditionally enabled
 in mbm_event mode

Hi Reinette,

On 10/14/25 11:24, Reinette Chatre wrote:
> Hi Babu,
>
> On 10/7/25 7:38 PM, Reinette Chatre wrote:
>> On 10/7/25 10:36 AM, Babu Moger wrote:
>>> On 10/6/25 20:23, Reinette Chatre wrote:
>>>> On 10/6/25 1:38 PM, Moger, Babu wrote:
>>>>> On 10/6/25 12:56, Reinette Chatre wrote:
>>>>>> On 9/30/25 1:26 PM, Babu Moger wrote:
>>>>>>> resctrl features can be enabled or disabled using boot-time kernel
>>>>>>> parameters. To turn off the memory bandwidth events (mbmtotal and
>>>>>>> mbmlocal), users need to pass the following parameter to the kernel:
>>>>>>> "rdt=!mbmtotal,!mbmlocal".
>>>>>> ah, indeed ... although, the intention behind the mbmtotal and mbmlocal kernel
>>>>>> parameters was to connect them to the actual hardware features identified
>>>>>> by X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL respectively.
>>>>>>
>>>>>>
>>>>>>> Found that memory bandwidth events (mbmtotal and mbmlocal) cannot be
>>>>>>> disabled when mbm_event mode is enabled. resctrl_mon_resource_init()
>>>>>>> unconditionally enables these events without checking if the underlying
>>>>>>> hardware supports them.
>>>>>> Technically this is correct since if hardware supports ABMC then the
>>>>>> hardware is no longer required to support X86_FEATURE_CQM_MBM_TOTAL and
>>>>>> X86_FEATURE_CQM_MBM_LOCAL in order to provide mbm_total_bytes
>>>>>> and mbm_local_bytes.
>>>>>>
>>>>>> I can see how this may be confusing to user space though ...
>>>>>>
>>>>>>> Remove the unconditional enablement of MBM features in
>>>>>>> resctrl_mon_resource_init() to fix the problem. The hardware support
>>>>>>> verification is already done in get_rdt_mon_resources().
>>>>>> I believe by "hardware support" you mean hardware support for
>>>>>> X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL. Wouldn't a fix like
>>>>>> this then require any system that supports ABMC to also support
>>>>>> X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL to be able to
>>>>>> support mbm_total_bytes and mbm_local_bytes?
>>>>> Yes. That is correct. Right now, ABMC and X86_FEATURE_CQM_MBM_TOTAL/
>>>>> X86_FEATURE_CQM_MBM_LOCAL are kind of tightly coupled. We have not clearly
>>>>> separated the that.
>>>> Are you speaking from resctrl side since from what I understand these are
>>>> independent features from the hardware side?
>>> It is independent from hardware side. I meant we still use legacy events from "default" mode.
>> Thank you for confirming. I was wondering if we need to fix it via cpuid_deps[]
>> and resctrl_cpu_detect() to address a hardware dependency. If hardware self
>> does not have the dependency then we need to fix it another way.
>>
>>>>>> This problem seems to be similar to the one solved by [1] since
>>>>>> by supporting ABMC there is no "hardware does not support mbmtotal/mbmlocal"
>>>>>> but instead there only needs to be a check if the feature has been disabled
>>>>>> by command line. That is, add a rdt_is_feature_enabled() check to the
>>>>>> existing "!resctrl_is_mon_event_enabled()" check?
>>>>> Enable or disable needs to be done at get_rdt_mon_resources(). It needs to
>>>>> be done early in  the initialization before calling domain_add_cpu() where
>>>>> event data structures (mbm_states aarch_mbm_states) are allocated.
>>>> Good point. My mistake to suggest the event should be enabled by
>>>> resctrl fs.
>>>
>>> How about adding another check in get_rdt_mon_resources()?
>>>
>>> if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)
>>>      || rdt_is_feature_enabled(mbmtotal)) {
>>>                  resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID);
>>>                  ret = true;
>>>          }
>> Something like this yes. I think it should be in rdt_get_mon_l3_config() though, within
>> the ABMC feature settings. If not then there may be an issue if the user boots with
>> rdt=!abmc? I cannot see why the rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL) check is needed,
>> which flow are you addressing?
>>
>> Before we exchange code I would like to step back a bit just to be clear that we agree
>> on the current issues and what user space may expect. After this it should be easier to
>> exchange code. (more below)
>>
>>> I need to take Tony's patch for this.
>>>
>>>>>> But wait ... I think there may be a bigger problem when considering systems
>>>>>> that support ABMC but not X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL.
>>>>>> Shouldn't resctrl prevent such a system from switching to "default"
>>>>>> mbm_assign_mode? Otherwise resctrl will happily let such a system switch
>>>>>> to default mode and when user attempts to read an event file resctrl will
>>>>>> attempt to read it via MSRs that are not supported.
>>>>>> Looks like ABMC may need something similar to CONFIG_RESCTRL_ASSIGN_FIXED
>>>>>> to handle this case in show() while preventing user space from switching to
>>>>>> "default" mode on write()?
>>>>> This may not be an issue right now. When X86_FEATURE_CQM_MBM_TOTAL and
>>>>> X86_FEATURE_CQM_MBM_LOCAL are not supported then mon_data files of these
>>>>> events are not created.
>>>> By "right now" I assume you mean the current implementation? I think your statement
>>>> assumes that no CPUs come or go after resctrl_mon_resource_init() enables the MBM events?
>>>> Current implementation will enable MBM events if ABMC is supported. When the
>>>> first CPU of a domain comes online after that then resctrl will create the mon_data
>>>> files. These files will remain if a user then switches to default mode and if
>>>> the user then attempts to read one of these counters then I expect problems.
>>> Yes. It will be a problem in the that case.
>> Thinking about this more the issue is not about the mon_data files being created since
>> they are only created if resctrl is mounted and resctrl_mon_resource_init() is run
>> before creating the mountpoint. From what I can tell current MBM events supported by
>> ABMC will be enabled at the time resctrl can be mounted so if X86_FEATURE_CQM_MBM_TOTAL
>> and X86_FEATURE_CQM_MBM_LOCAL are not supported but ABMC is then I believe the
>> mon_data files will be created.
>>
>> There is a problem with the actual domain creation during resctrl initialization
>> where the MBM state data structures are created and depend on the events being
>> enabled then.
>> resctrl assumes that if an event is enabled then that event's associated
>> rdt_mon_domain::mbm_states and rdt_hw_mon_domain::arch_mbm_states exist and if
>> those data structures are created (or not created) during CPU online and MBM
>> event comes online later then there will be invalid memory accesses.
>>
>> The conclusion is the same though ... the events need to be initialized during
>> resctrl initialization as you note above.
>>
>>> I am not clear on using config option you mentioned above.
>> This is more about what is accomplished by the config option than whether it is
>> a config option that controls the flow. More below but I believe there may be
>> scenarios where only mbm_event is supported and in that case I expect, even on AMD,
>> it may be possible that there is no supported "default" mode and thus:
>>   # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>>    [mbm_event]
>>
>>> What about using the check resctrl_is_mon_event_enabled() in
>>>
>>> resctrl_mbm_assign_mode_show() and resctrl_mbm_assign_mode_write() ?
>>>
>> Trying to think through how to support a system that can switch between default
>> and mbm_event mode I see a couple of things to consider. This is as I am thinking
>> through the flows without able to experiment. I think it may help if you could sanity
>> check this with perhaps a few experiments to considering the flows yourself to see where
>> I am missing things.
>>
>> When we are clear on the flows to support and how to interact with user space it will
>> be easier to start exchanging code.
>>
>> a) MBM state data structures
>>     As mentioned above, rdt_mon_domain::mbm_states and rdt_hw_mon_domain::arch_mbm_states
>>     are created during CPU online based on MBM event enabled state. During runtime
>>     an enabled MBM event is assumed to have state.
>>     To me this implies that any possible MBM event should be enabled during early
>>     initialization.
>>     A consequence is that any possible MBM event will have its associated event file
>>     created even if the active mode of the time cannot support it. (I do not think
>>     we want to have event files come and go).
>> b) Switching between modes.
>>     From what I can tell switching mode is always allowed as long as system supports
>>     assignable counters and that may not be correct. Consider a system that supports
>>     ABMC but does not support X86_FEATURE_CQM_MBM_TOTAL and/or X86_FEATURE_CQM_MBM_LOCAL ...
>>     should it be allowed to switch to "default" mode? At this time I believe this is allowed
>>     yet this is an unusable state (as far as MBM goes) and I expect any attempt at reading
>>     an event file will result in invalid MSR access?
>>     Complexity increases if there is a mismatch in supported events, for example if mbm_event
>>     mode supports total and local but default mode only supports one. Should it be allowed
>>     to switch modes? If so, user can then still read from both files, the check whether assignable
>>     counters is enabled will fail and resctrl will attempt to read both via the counter MSRs,
>>     even an unsupported event (continued below).
>> c) Read of event file
>>     A user can read from event file any time even if active mode (default or mbm_event) does
>>     not support it. If mbm_event mode is enabled then resctrl will attempt to use counters,
>>     if default mode is enabled then resctrl will attempt to use MSRs.
>>     This currently entirely depends on whether mbm_event mode is enabled or not.
>>     Perhaps we should add checks here to prevent user from reading an event if the
>>     active mode does not support it? Alternatively prevent user from switching to a mode
>>     that cannot be supported.
>>
>> Look forward to how you view things and thoughts on how user may expect to interact with these
>> features.


Yea.  Taken note of all your points. Sorry for the Iate response.  I was 
investigating on how to fix in a proper way.


> I am concerned about this issue. The original changelog only mentions that events are enabled when
> they should not be but it looks to me that there is a more serious issue if the user then attempts
> to read from such an event. Have you tried the scenario when a user boots with the parameters
> mentioned in changelog (rdt=!mbmtotal,!mbmlocal) and then attempts to read one of these events?
> Reading from the event will attempt to access its architectural state but from what I can tell
> that will not be allocated since the events are not enabled at the time of the allocation.


Yes. I saw the issues. It fails to mount in my case with panic trace.


>
> This needs to be fixed during this cycle. A week has passed since my previous message so I do not


Yes. I understand your concern.


> think that it will be possible to create a full featured solution that keeps X86_FEATURE_ABMC
> and X86_FEATURE_CQM_MBM_TOTAL/X86_FEATURE_CQM_MBM_LOCAL independent.


Agree.


>
> What do you think of something like below that builds on your original change and additionally
> enforces dependency between these features to support the resctrl assumptions? From what I understand
> this is ok for current AMD hardware? A not-as-urgent follow-up can make these features independent
> again?


Yes. I tested it. Works fine.  It defaults to "default" mode if both the 
events(local and total) are disabled in kernel parameter. That is expected.


>
>
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index c8945610d455..fd42fe7b2fdc 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -452,7 +452,16 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>   		r->mon.mbm_cfg_mask = ecx & MAX_EVT_CONFIG_BITS;
>   	}
>   
> -	if (rdt_cpu_has(X86_FEATURE_ABMC)) {
> +	/*
> +	 * resctrl assumes a system that supports assignable counters can
> +	 * switch to "default" mode. Ensure that there is a "default" mode
> +	 * to switch to. This enforces a dependency between the independent
> +	 * X86_FEATURE_ABMC and X86_FEATURE_CQM_MBM_TOTAL/X86_FEATURE_CQM_MBM_LOCAL
> +	 * hardware features.
> +	 */
> +	if (rdt_cpu_has(X86_FEATURE_ABMC) &&
> +	    (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL) ||
> +	     rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL))) {
>   		r->mon.mbm_cntr_assignable = true;
>   		cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
>   		r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
> index 4076336fbba6..572a9925bd6c 100644
> --- a/fs/resctrl/monitor.c
> +++ b/fs/resctrl/monitor.c
> @@ -1782,15 +1782,13 @@ int resctrl_mon_resource_init(void)
>   		mba_mbps_default_event = QOS_L3_MBM_TOTAL_EVENT_ID;
>   
>   	if (r->mon.mbm_cntr_assignable) {
> -		if (!resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
> -			resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID);
> -		if (!resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
> -			resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID);
> -		mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID].evt_cfg = r->mon.mbm_cfg_mask;
> -		mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID].evt_cfg = r->mon.mbm_cfg_mask &
> -								   (READS_TO_LOCAL_MEM |
> -								    READS_TO_LOCAL_S_MEM |
> -								    NON_TEMP_WRITE_TO_LOCAL_MEM);
> +		if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
> +			mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID].evt_cfg = r->mon.mbm_cfg_mask;
> +		if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
> +			mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID].evt_cfg = r->mon.mbm_cfg_mask &
> +									   (READS_TO_LOCAL_MEM |
> +									    READS_TO_LOCAL_S_MEM |
> +									    NON_TEMP_WRITE_TO_LOCAL_MEM);
>   		r->mon.mbm_assign_on_mkdir = true;
>   		resctrl_file_fflags_init("num_mbm_cntrs",
>   					 RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
>
>
>
>
>
>
Thanks for the quick patch.

- Babu

>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ