[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7292333a-a4f1-4217-8c72-436812f29be8@amd.com>
Date: Tue, 14 Oct 2025 12:43:02 -0500
From: Babu Moger <bmoger@....com>
To: Reinette Chatre <reinette.chatre@...el.com>, babu.moger@....com,
tony.luck@...el.com, Dave.Martin@....com, james.morse@....com,
dave.hansen@...ux.intel.com, bp@...en8.de
Cc: kas@...nel.org, rick.p.edgecombe@...el.com, linux-kernel@...r.kernel.org,
x86@...nel.org, linux-coco@...ts.linux.dev, kvm@...r.kernel.org
Subject: Re: [PATCH] fs/resctrl: Fix MBM events being unconditionally enabled
in mbm_event mode
Hi Reinette,
On 10/14/25 12:38, Babu Moger wrote:
> Hi Reinette,
>
> On 10/14/25 11:24, Reinette Chatre wrote:
>> Hi Babu,
>>
>> On 10/7/25 7:38 PM, Reinette Chatre wrote:
>>> On 10/7/25 10:36 AM, Babu Moger wrote:
>>>> On 10/6/25 20:23, Reinette Chatre wrote:
>>>>> On 10/6/25 1:38 PM, Moger, Babu wrote:
>>>>>> On 10/6/25 12:56, Reinette Chatre wrote:
>>>>>>> On 9/30/25 1:26 PM, Babu Moger wrote:
>>>>>>>> resctrl features can be enabled or disabled using boot-time kernel
>>>>>>>> parameters. To turn off the memory bandwidth events (mbmtotal and
>>>>>>>> mbmlocal), users need to pass the following parameter to the
>>>>>>>> kernel:
>>>>>>>> "rdt=!mbmtotal,!mbmlocal".
>>>>>>> ah, indeed ... although, the intention behind the mbmtotal and
>>>>>>> mbmlocal kernel
>>>>>>> parameters was to connect them to the actual hardware features
>>>>>>> identified
>>>>>>> by X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL
>>>>>>> respectively.
>>>>>>>
>>>>>>>
>>>>>>>> Found that memory bandwidth events (mbmtotal and mbmlocal)
>>>>>>>> cannot be
>>>>>>>> disabled when mbm_event mode is enabled.
>>>>>>>> resctrl_mon_resource_init()
>>>>>>>> unconditionally enables these events without checking if the
>>>>>>>> underlying
>>>>>>>> hardware supports them.
>>>>>>> Technically this is correct since if hardware supports ABMC then
>>>>>>> the
>>>>>>> hardware is no longer required to support
>>>>>>> X86_FEATURE_CQM_MBM_TOTAL and
>>>>>>> X86_FEATURE_CQM_MBM_LOCAL in order to provide mbm_total_bytes
>>>>>>> and mbm_local_bytes.
>>>>>>>
>>>>>>> I can see how this may be confusing to user space though ...
>>>>>>>
>>>>>>>> Remove the unconditional enablement of MBM features in
>>>>>>>> resctrl_mon_resource_init() to fix the problem. The hardware
>>>>>>>> support
>>>>>>>> verification is already done in get_rdt_mon_resources().
>>>>>>> I believe by "hardware support" you mean hardware support for
>>>>>>> X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL.
>>>>>>> Wouldn't a fix like
>>>>>>> this then require any system that supports ABMC to also support
>>>>>>> X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL to be
>>>>>>> able to
>>>>>>> support mbm_total_bytes and mbm_local_bytes?
>>>>>> Yes. That is correct. Right now, ABMC and X86_FEATURE_CQM_MBM_TOTAL/
>>>>>> X86_FEATURE_CQM_MBM_LOCAL are kind of tightly coupled. We have
>>>>>> not clearly
>>>>>> separated the that.
>>>>> Are you speaking from resctrl side since from what I understand
>>>>> these are
>>>>> independent features from the hardware side?
>>>> It is independent from hardware side. I meant we still use legacy
>>>> events from "default" mode.
>>> Thank you for confirming. I was wondering if we need to fix it via
>>> cpuid_deps[]
>>> and resctrl_cpu_detect() to address a hardware dependency. If
>>> hardware self
>>> does not have the dependency then we need to fix it another way.
>>>
>>>>>>> This problem seems to be similar to the one solved by [1] since
>>>>>>> by supporting ABMC there is no "hardware does not support
>>>>>>> mbmtotal/mbmlocal"
>>>>>>> but instead there only needs to be a check if the feature has
>>>>>>> been disabled
>>>>>>> by command line. That is, add a rdt_is_feature_enabled() check
>>>>>>> to the
>>>>>>> existing "!resctrl_is_mon_event_enabled()" check?
>>>>>> Enable or disable needs to be done at get_rdt_mon_resources(). It
>>>>>> needs to
>>>>>> be done early in the initialization before calling
>>>>>> domain_add_cpu() where
>>>>>> event data structures (mbm_states aarch_mbm_states) are allocated.
>>>>> Good point. My mistake to suggest the event should be enabled by
>>>>> resctrl fs.
>>>>
>>>> How about adding another check in get_rdt_mon_resources()?
>>>>
>>>> if (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL)
>>>> || rdt_is_feature_enabled(mbmtotal)) {
>>>> resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID);
>>>> ret = true;
>>>> }
>>> Something like this yes. I think it should be in
>>> rdt_get_mon_l3_config() though, within
>>> the ABMC feature settings. If not then there may be an issue if the
>>> user boots with
>>> rdt=!abmc? I cannot see why the
>>> rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL) check is needed,
>>> which flow are you addressing?
>>>
>>> Before we exchange code I would like to step back a bit just to be
>>> clear that we agree
>>> on the current issues and what user space may expect. After this it
>>> should be easier to
>>> exchange code. (more below)
>>>
>>>> I need to take Tony's patch for this.
>>>>
>>>>>>> But wait ... I think there may be a bigger problem when
>>>>>>> considering systems
>>>>>>> that support ABMC but not X86_FEATURE_CQM_MBM_TOTAL and
>>>>>>> X86_FEATURE_CQM_MBM_LOCAL.
>>>>>>> Shouldn't resctrl prevent such a system from switching to "default"
>>>>>>> mbm_assign_mode? Otherwise resctrl will happily let such a
>>>>>>> system switch
>>>>>>> to default mode and when user attempts to read an event file
>>>>>>> resctrl will
>>>>>>> attempt to read it via MSRs that are not supported.
>>>>>>> Looks like ABMC may need something similar to
>>>>>>> CONFIG_RESCTRL_ASSIGN_FIXED
>>>>>>> to handle this case in show() while preventing user space from
>>>>>>> switching to
>>>>>>> "default" mode on write()?
>>>>>> This may not be an issue right now. When
>>>>>> X86_FEATURE_CQM_MBM_TOTAL and
>>>>>> X86_FEATURE_CQM_MBM_LOCAL are not supported then mon_data files
>>>>>> of these
>>>>>> events are not created.
>>>>> By "right now" I assume you mean the current implementation? I
>>>>> think your statement
>>>>> assumes that no CPUs come or go after resctrl_mon_resource_init()
>>>>> enables the MBM events?
>>>>> Current implementation will enable MBM events if ABMC is
>>>>> supported. When the
>>>>> first CPU of a domain comes online after that then resctrl will
>>>>> create the mon_data
>>>>> files. These files will remain if a user then switches to default
>>>>> mode and if
>>>>> the user then attempts to read one of these counters then I expect
>>>>> problems.
>>>> Yes. It will be a problem in the that case.
>>> Thinking about this more the issue is not about the mon_data files
>>> being created since
>>> they are only created if resctrl is mounted and
>>> resctrl_mon_resource_init() is run
>>> before creating the mountpoint. From what I can tell current MBM
>>> events supported by
>>> ABMC will be enabled at the time resctrl can be mounted so if
>>> X86_FEATURE_CQM_MBM_TOTAL
>>> and X86_FEATURE_CQM_MBM_LOCAL are not supported but ABMC is then I
>>> believe the
>>> mon_data files will be created.
>>>
>>> There is a problem with the actual domain creation during resctrl
>>> initialization
>>> where the MBM state data structures are created and depend on the
>>> events being
>>> enabled then.
>>> resctrl assumes that if an event is enabled then that event's
>>> associated
>>> rdt_mon_domain::mbm_states and rdt_hw_mon_domain::arch_mbm_states
>>> exist and if
>>> those data structures are created (or not created) during CPU online
>>> and MBM
>>> event comes online later then there will be invalid memory accesses.
>>>
>>> The conclusion is the same though ... the events need to be
>>> initialized during
>>> resctrl initialization as you note above.
>>>
>>>> I am not clear on using config option you mentioned above.
>>> This is more about what is accomplished by the config option than
>>> whether it is
>>> a config option that controls the flow. More below but I believe
>>> there may be
>>> scenarios where only mbm_event is supported and in that case I
>>> expect, even on AMD,
>>> it may be possible that there is no supported "default" mode and thus:
>>> # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>>> [mbm_event]
>>>
>>>> What about using the check resctrl_is_mon_event_enabled() in
>>>>
>>>> resctrl_mbm_assign_mode_show() and resctrl_mbm_assign_mode_write() ?
>>>>
>>> Trying to think through how to support a system that can switch
>>> between default
>>> and mbm_event mode I see a couple of things to consider. This is as
>>> I am thinking
>>> through the flows without able to experiment. I think it may help if
>>> you could sanity
>>> check this with perhaps a few experiments to considering the flows
>>> yourself to see where
>>> I am missing things.
>>>
>>> When we are clear on the flows to support and how to interact with
>>> user space it will
>>> be easier to start exchanging code.
>>>
>>> a) MBM state data structures
>>> As mentioned above, rdt_mon_domain::mbm_states and
>>> rdt_hw_mon_domain::arch_mbm_states
>>> are created during CPU online based on MBM event enabled state.
>>> During runtime
>>> an enabled MBM event is assumed to have state.
>>> To me this implies that any possible MBM event should be enabled
>>> during early
>>> initialization.
>>> A consequence is that any possible MBM event will have its
>>> associated event file
>>> created even if the active mode of the time cannot support it.
>>> (I do not think
>>> we want to have event files come and go).
>>> b) Switching between modes.
>>> From what I can tell switching mode is always allowed as long as
>>> system supports
>>> assignable counters and that may not be correct. Consider a
>>> system that supports
>>> ABMC but does not support X86_FEATURE_CQM_MBM_TOTAL and/or
>>> X86_FEATURE_CQM_MBM_LOCAL ...
>>> should it be allowed to switch to "default" mode? At this time I
>>> believe this is allowed
>>> yet this is an unusable state (as far as MBM goes) and I expect
>>> any attempt at reading
>>> an event file will result in invalid MSR access?
>>> Complexity increases if there is a mismatch in supported events,
>>> for example if mbm_event
>>> mode supports total and local but default mode only supports
>>> one. Should it be allowed
>>> to switch modes? If so, user can then still read from both
>>> files, the check whether assignable
>>> counters is enabled will fail and resctrl will attempt to read
>>> both via the counter MSRs,
>>> even an unsupported event (continued below).
>>> c) Read of event file
>>> A user can read from event file any time even if active mode
>>> (default or mbm_event) does
>>> not support it. If mbm_event mode is enabled then resctrl will
>>> attempt to use counters,
>>> if default mode is enabled then resctrl will attempt to use MSRs.
>>> This currently entirely depends on whether mbm_event mode is
>>> enabled or not.
>>> Perhaps we should add checks here to prevent user from reading
>>> an event if the
>>> active mode does not support it? Alternatively prevent user from
>>> switching to a mode
>>> that cannot be supported.
>>>
>>> Look forward to how you view things and thoughts on how user may
>>> expect to interact with these
>>> features.
>
>
> Yea. Taken note of all your points. Sorry for the Iate response. I
> was investigating on how to fix in a proper way.
>
>
>> I am concerned about this issue. The original changelog only mentions
>> that events are enabled when
>> they should not be but it looks to me that there is a more serious
>> issue if the user then attempts
>> to read from such an event. Have you tried the scenario when a user
>> boots with the parameters
>> mentioned in changelog (rdt=!mbmtotal,!mbmlocal) and then attempts to
>> read one of these events?
>> Reading from the event will attempt to access its architectural state
>> but from what I can tell
>> that will not be allocated since the events are not enabled at the
>> time of the allocation.
>
>
> Yes. I saw the issues. It fails to mount in my case with panic trace.
>
>
>>
>> This needs to be fixed during this cycle. A week has passed since my
>> previous message so I do not
>
>
> Yes. I understand your concern.
>
>
>> think that it will be possible to create a full featured solution
>> that keeps X86_FEATURE_ABMC
>> and X86_FEATURE_CQM_MBM_TOTAL/X86_FEATURE_CQM_MBM_LOCAL independent.
>
>
> Agree.
>
>
>>
>> What do you think of something like below that builds on your
>> original change and additionally
>> enforces dependency between these features to support the resctrl
>> assumptions? From what I understand
>> this is ok for current AMD hardware? A not-as-urgent follow-up can
>> make these features independent
>> again?
>
>
> Yes. I tested it. Works fine. It defaults to "default" mode if both
> the events(local and total) are disabled in kernel parameter. That is
> expected.
>
>
>>
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c
>> b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index c8945610d455..fd42fe7b2fdc 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -452,7 +452,16 @@ int __init rdt_get_mon_l3_config(struct
>> rdt_resource *r)
>> r->mon.mbm_cfg_mask = ecx & MAX_EVT_CONFIG_BITS;
>> }
>> - if (rdt_cpu_has(X86_FEATURE_ABMC)) {
>> + /*
>> + * resctrl assumes a system that supports assignable counters can
>> + * switch to "default" mode. Ensure that there is a "default" mode
>> + * to switch to. This enforces a dependency between the independent
>> + * X86_FEATURE_ABMC and
>> X86_FEATURE_CQM_MBM_TOTAL/X86_FEATURE_CQM_MBM_LOCAL
>> + * hardware features.
>> + */
>> + if (rdt_cpu_has(X86_FEATURE_ABMC) &&
>> + (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL) ||
>> + rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL))) {
>> r->mon.mbm_cntr_assignable = true;
>> cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
>> r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>> index 4076336fbba6..572a9925bd6c 100644
>> --- a/fs/resctrl/monitor.c
>> +++ b/fs/resctrl/monitor.c
>> @@ -1782,15 +1782,13 @@ int resctrl_mon_resource_init(void)
>> mba_mbps_default_event = QOS_L3_MBM_TOTAL_EVENT_ID;
>> if (r->mon.mbm_cntr_assignable) {
>> - if (!resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
>> - resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID);
>> - if (!resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
>> - resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID);
>> - mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID].evt_cfg =
>> r->mon.mbm_cfg_mask;
>> - mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID].evt_cfg =
>> r->mon.mbm_cfg_mask &
>> - (READS_TO_LOCAL_MEM |
>> - READS_TO_LOCAL_S_MEM |
>> - NON_TEMP_WRITE_TO_LOCAL_MEM);
>> + if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
>> + mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID].evt_cfg =
>> r->mon.mbm_cfg_mask;
>> + if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
>> + mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID].evt_cfg =
>> r->mon.mbm_cfg_mask &
>> + (READS_TO_LOCAL_MEM |
>> + READS_TO_LOCAL_S_MEM |
>> + NON_TEMP_WRITE_TO_LOCAL_MEM);
>> r->mon.mbm_assign_on_mkdir = true;
>> resctrl_file_fflags_init("num_mbm_cntrs",
>> RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
>>
>>
>>
>>
I can send the official patch if you are ok to go ahead with the patch.
Let me know if I can add Signoff from you or you can respond after it is
reviewed.
>>
>>
> Thanks for the quick patch.
>
> - Babu
>
>>
>
Powered by blists - more mailing lists