lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1315076d-24f9-4e27-b945-51564cadfaed@intel.com>
Date: Tue, 7 Oct 2025 19:38:01 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Babu Moger <bmoger@....com>, <babu.moger@....com>, <tony.luck@...el.com>,
	<Dave.Martin@....com>, <james.morse@....com>, <dave.hansen@...ux.intel.com>,
	<bp@...en8.de>
CC: <kas@...nel.org>, <rick.p.edgecombe@...el.com>,
	<linux-kernel@...r.kernel.org>, <x86@...nel.org>,
	<linux-coco@...ts.linux.dev>, <kvm@...r.kernel.org>
Subject: Re: [PATCH] fs/resctrl: Fix MBM events being unconditionally enabled
 in mbm_event mode

Hi Babu,

On 10/7/25 10:36 AM, Babu Moger wrote:
> Hi Reinette,
> 
> On 10/6/25 20:23, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 10/6/25 1:38 PM, Moger, Babu wrote:
>>> Hi Reinette,
>>>
>>> On 10/6/25 12:56, Reinette Chatre wrote:
>>>> Hi Babu,
>>>>
>>>> On 9/30/25 1:26 PM, Babu Moger wrote:
>>>>> resctrl features can be enabled or disabled using boot-time kernel
>>>>> parameters. To turn off the memory bandwidth events (mbmtotal and
>>>>> mbmlocal), users need to pass the following parameter to the kernel:
>>>>> "rdt=!mbmtotal,!mbmlocal".
>>>>
>>>> ah, indeed ... although, the intention behind the mbmtotal and mbmlocal kernel
>>>> parameters was to connect them to the actual hardware features identified
>>>> by X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL respectively.
>>>>
>>>>
>>>>> Found that memory bandwidth events (mbmtotal and mbmlocal) cannot be
>>>>> disabled when mbm_event mode is enabled. resctrl_mon_resource_init()
>>>>> unconditionally enables these events without checking if the underlying
>>>>> hardware supports them.
>>>>
>>>> Technically this is correct since if hardware supports ABMC then the
>>>> hardware is no longer required to support X86_FEATURE_CQM_MBM_TOTAL and
>>>> X86_FEATURE_CQM_MBM_LOCAL in order to provide mbm_total_bytes
>>>> and mbm_local_bytes.
>>>>
>>>> I can see how this may be confusing to user space though ...
>>>>
>>>>>
>>>>> Remove the unconditional enablement of MBM features in
>>>>> resctrl_mon_resource_init() to fix the problem. The hardware support
>>>>> verification is already done in get_rdt_mon_resources().
>>>>
>>>> I believe by "hardware support" you mean hardware support for
>>>> X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL. Wouldn't a fix like
>>>> this then require any system that supports ABMC to also support
>>>> X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL to be able to
>>>> support mbm_total_bytes and mbm_local_bytes?
>>>
>>> Yes. That is correct. Right now, ABMC and X86_FEATURE_CQM_MBM_TOTAL/
>>> X86_FEATURE_CQM_MBM_LOCAL are kind of tightly coupled. We have not clearly
>>> separated the that.
>>
>> Are you speaking from resctrl side since from what I understand these are
>> independent features from the hardware side?
> 
> It is independent from hardware side. I meant we still use legacy events from "default" mode.

Thank you for confirming. I was wondering if we need to fix it via cpuid_deps[]
and resctrl_cpu_detect() to address a hardware dependency. If hardware self
does not have the dependency then we need to fix it another way.

> 
>>
>>>> This problem seems to be similar to the one solved by [1] since
>>>> by supporting ABMC there is no "hardware does not support mbmtotal/mbmlocal"
>>>> but instead there only needs to be a check if the feature has been disabled
>>>> by command line. That is, add a rdt_is_feature_enabled() check to the
>>>> existing "!resctrl_is_mon_event_enabled()" check?
>>>
>>> Enable or disable needs to be done at get_rdt_mon_resources(). It needs to
>>> be done early in  the initialization before calling domain_add_cpu() where
>>> event data structures (mbm_states aarch_mbm_states) are allocated.
>>
>> Good point. My mistake to suggest the event should be enabled by
>> resctrl fs.
> 
> 
> How about adding another check in get_rdt_mon_resources()?
> 
> if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)
>     || rdt_is_feature_enabled(mbmtotal)) {
>                 resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID);
>                 ret = true;
>         }

Something like this yes. I think it should be in rdt_get_mon_l3_config() though, within
the ABMC feature settings. If not then there may be an issue if the user boots with
rdt=!abmc? I cannot see why the rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL) check is needed,
which flow are you addressing?

Before we exchange code I would like to step back a bit just to be clear that we agree
on the current issues and what user space may expect. After this it should be easier to
exchange code. (more below)

> 
> I need to take Tony's patch for this.
> 
>>
>>>
>>>>
>>>> But wait ... I think there may be a bigger problem when considering systems
>>>> that support ABMC but not X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL.
>>>> Shouldn't resctrl prevent such a system from switching to "default"
>>>> mbm_assign_mode? Otherwise resctrl will happily let such a system switch
>>>> to default mode and when user attempts to read an event file resctrl will
>>>> attempt to read it via MSRs that are not supported.
>>>> Looks like ABMC may need something similar to CONFIG_RESCTRL_ASSIGN_FIXED
>>>> to handle this case in show() while preventing user space from switching to
>>>> "default" mode on write()?
>>>
>>> This may not be an issue right now. When X86_FEATURE_CQM_MBM_TOTAL and
>>> X86_FEATURE_CQM_MBM_LOCAL are not supported then mon_data files of these
>>> events are not created.
>>
>> By "right now" I assume you mean the current implementation? I think your statement
>> assumes that no CPUs come or go after resctrl_mon_resource_init() enables the MBM events?
>> Current implementation will enable MBM events if ABMC is supported. When the
>> first CPU of a domain comes online after that then resctrl will create the mon_data
>> files. These files will remain if a user then switches to default mode and if
>> the user then attempts to read one of these counters then I expect problems.
> 
> Yes. It will be a problem in the that case.

Thinking about this more the issue is not about the mon_data files being created since
they are only created if resctrl is mounted and resctrl_mon_resource_init() is run
before creating the mountpoint. From what I can tell current MBM events supported by
ABMC will be enabled at the time resctrl can be mounted so if X86_FEATURE_CQM_MBM_TOTAL
and X86_FEATURE_CQM_MBM_LOCAL are not supported but ABMC is then I believe the
mon_data files will be created.

There is a problem with the actual domain creation during resctrl initialization
where the MBM state data structures are created and depend on the events being
enabled then.
resctrl assumes that if an event is enabled then that event's associated
rdt_mon_domain::mbm_states and rdt_hw_mon_domain::arch_mbm_states exist and if
those data structures are created (or not created) during CPU online and MBM
event comes online later then there will be invalid memory accesses.

The conclusion is the same though ... the events need to be initialized during
resctrl initialization as you note above.

> 
> I am not clear on using config option you mentioned above.

This is more about what is accomplished by the config option than whether it is
a config option that controls the flow. More below but I believe there may be
scenarios where only mbm_event is supported and in that case I expect, even on AMD,
it may be possible that there is no supported "default" mode and thus:
 # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode                             
  [mbm_event]

> 
> What about using the check resctrl_is_mon_event_enabled() in
> 
> resctrl_mbm_assign_mode_show() and resctrl_mbm_assign_mode_write() ?
> 

Trying to think through how to support a system that can switch between default
and mbm_event mode I see a couple of things to consider. This is as I am thinking
through the flows without able to experiment. I think it may help if you could sanity
check this with perhaps a few experiments to considering the flows yourself to see where
I am missing things.

When we are clear on the flows to support and how to interact with user space it will
be easier to start exchanging code.

a) MBM state data structures
   As mentioned above, rdt_mon_domain::mbm_states and rdt_hw_mon_domain::arch_mbm_states
   are created during CPU online based on MBM event enabled state. During runtime
   an enabled MBM event is assumed to have state.
   To me this implies that any possible MBM event should be enabled during early
   initialization.
   A consequence is that any possible MBM event will have its associated event file
   created even if the active mode of the time cannot support it. (I do not think
   we want to have event files come and go).
b) Switching between modes.
   From what I can tell switching mode is always allowed as long as system supports
   assignable counters and that may not be correct. Consider a system that supports
   ABMC but does not support X86_FEATURE_CQM_MBM_TOTAL and/or X86_FEATURE_CQM_MBM_LOCAL ...
   should it be allowed to switch to "default" mode? At this time I believe this is allowed
   yet this is an unusable state (as far as MBM goes) and I expect any attempt at reading
   an event file will result in invalid MSR access?
   Complexity increases if there is a mismatch in supported events, for example if mbm_event
   mode supports total and local but default mode only supports one. Should it be allowed
   to switch modes? If so, user can then still read from both files, the check whether assignable
   counters is enabled will fail and resctrl will attempt to read both via the counter MSRs,
   even an unsupported event (continued below).
c) Read of event file
   A user can read from event file any time even if active mode (default or mbm_event) does
   not support it. If mbm_event mode is enabled then resctrl will attempt to use counters,
   if default mode is enabled then resctrl will attempt to use MSRs.
   This currently entirely depends on whether mbm_event mode is enabled or not.
   Perhaps we should add checks here to prevent user from reading an event if the
   active mode does not support it? Alternatively prevent user from switching to a mode
   that cannot be supported.

Look forward to how you view things and thoughts on how user may expect to interact with these
features.

Reinette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ