lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a75b2fa6-409c-4b33-9142-7be02bf6d217@amd.com>
Date: Tue, 14 Oct 2025 17:45:27 -0500
From: "Moger, Babu" <bmoger@....com>
To: Reinette Chatre <reinette.chatre@...el.com>, babu.moger@....com,
 tony.luck@...el.com, Dave.Martin@....com, james.morse@....com,
 dave.hansen@...ux.intel.com, bp@...en8.de
Cc: kas@...nel.org, rick.p.edgecombe@...el.com, linux-kernel@...r.kernel.org,
 x86@...nel.org, linux-coco@...ts.linux.dev, kvm@...r.kernel.org
Subject: Re: [PATCH] fs/resctrl: Fix MBM events being unconditionally enabled
 in mbm_event mode

Hi Reinette,

On 10/14/2025 3:57 PM, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/14/25 10:43 AM, Babu Moger wrote:
>> On 10/14/25 12:38, Babu Moger wrote:
>>> On 10/14/25 11:24, Reinette Chatre wrote:
>>>> On 10/7/25 7:38 PM, Reinette Chatre wrote:
>>>>> On 10/7/25 10:36 AM, Babu Moger wrote:
>>>>>> On 10/6/25 20:23, Reinette Chatre wrote:
>>>>>>> On 10/6/25 1:38 PM, Moger, Babu wrote:
>>>>>>>> On 10/6/25 12:56, Reinette Chatre wrote:
>>>>>>>>> On 9/30/25 1:26 PM, Babu Moger wrote:
> 
> ...
> 
>>>>>>>>> But wait ... I think there may be a bigger problem when considering systems
>>>>>>>>> that support ABMC but not X86_FEATURE_CQM_MBM_TOTAL and X86_FEATURE_CQM_MBM_LOCAL.
>>>>>>>>> Shouldn't resctrl prevent such a system from switching to "default"
>>>>>>>>> mbm_assign_mode? Otherwise resctrl will happily let such a system switch
>>>>>>>>> to default mode and when user attempts to read an event file resctrl will
>>>>>>>>> attempt to read it via MSRs that are not supported.
>>>>>>>>> Looks like ABMC may need something similar to CONFIG_RESCTRL_ASSIGN_FIXED
>>>>>>>>> to handle this case in show() while preventing user space from switching to
>>>>>>>>> "default" mode on write()?
>>>>>>>> This may not be an issue right now. When X86_FEATURE_CQM_MBM_TOTAL and
>>>>>>>> X86_FEATURE_CQM_MBM_LOCAL are not supported then mon_data files of these
>>>>>>>> events are not created.
>>>>>>> By "right now" I assume you mean the current implementation? I think your statement
>>>>>>> assumes that no CPUs come or go after resctrl_mon_resource_init() enables the MBM events?
>>>>>>> Current implementation will enable MBM events if ABMC is supported. When the
>>>>>>> first CPU of a domain comes online after that then resctrl will create the mon_data
>>>>>>> files. These files will remain if a user then switches to default mode and if
>>>>>>> the user then attempts to read one of these counters then I expect problems.
>>>>>> Yes. It will be a problem in the that case.
>>>>> Thinking about this more the issue is not about the mon_data files being created since
>>>>> they are only created if resctrl is mounted and resctrl_mon_resource_init() is run
>>>>> before creating the mountpoint. From what I can tell current MBM events supported by
>>>>> ABMC will be enabled at the time resctrl can be mounted so if X86_FEATURE_CQM_MBM_TOTAL
>>>>> and X86_FEATURE_CQM_MBM_LOCAL are not supported but ABMC is then I believe the
>>>>> mon_data files will be created.
>>>>>
>>>>> There is a problem with the actual domain creation during resctrl initialization
>>>>> where the MBM state data structures are created and depend on the events being
>>>>> enabled then.
>>>>> resctrl assumes that if an event is enabled then that event's associated
>>>>> rdt_mon_domain::mbm_states and rdt_hw_mon_domain::arch_mbm_states exist and if
>>>>> those data structures are created (or not created) during CPU online and MBM
>>>>> event comes online later then there will be invalid memory accesses.
>>>>>
>>>>> The conclusion is the same though ... the events need to be initialized during
>>>>> resctrl initialization as you note above.
>>>>>
>>>>>> I am not clear on using config option you mentioned above.
>>>>> This is more about what is accomplished by the config option than whether it is
>>>>> a config option that controls the flow. More below but I believe there may be
>>>>> scenarios where only mbm_event is supported and in that case I expect, even on AMD,
>>>>> it may be possible that there is no supported "default" mode and thus:
>>>>>    # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>>>>>     [mbm_event]
>>>>>
>>>>>> What about using the check resctrl_is_mon_event_enabled() in
>>>>>>
>>>>>> resctrl_mbm_assign_mode_show() and resctrl_mbm_assign_mode_write() ?
>>>>>>
>>>>> Trying to think through how to support a system that can switch between default
>>>>> and mbm_event mode I see a couple of things to consider. This is as I am thinking
>>>>> through the flows without able to experiment. I think it may help if you could sanity
>>>>> check this with perhaps a few experiments to considering the flows yourself to see where
>>>>> I am missing things.
>>>>>
>>>>> When we are clear on the flows to support and how to interact with user space it will
>>>>> be easier to start exchanging code.
>>>>>
>>>>> a) MBM state data structures
>>>>>      As mentioned above, rdt_mon_domain::mbm_states and rdt_hw_mon_domain::arch_mbm_states
>>>>>      are created during CPU online based on MBM event enabled state. During runtime
>>>>>      an enabled MBM event is assumed to have state.
>>>>>      To me this implies that any possible MBM event should be enabled during early
>>>>>      initialization.
>>>>>      A consequence is that any possible MBM event will have its associated event file
>>>>>      created even if the active mode of the time cannot support it. (I do not think
>>>>>      we want to have event files come and go).
>>>>> b) Switching between modes.
>>>>>      From what I can tell switching mode is always allowed as long as system supports
>>>>>      assignable counters and that may not be correct. Consider a system that supports
>>>>>      ABMC but does not support X86_FEATURE_CQM_MBM_TOTAL and/or X86_FEATURE_CQM_MBM_LOCAL ...
>>>>>      should it be allowed to switch to "default" mode? At this time I believe this is allowed
>>>>>      yet this is an unusable state (as far as MBM goes) and I expect any attempt at reading
>>>>>      an event file will result in invalid MSR access?
>>>>>      Complexity increases if there is a mismatch in supported events, for example if mbm_event
>>>>>      mode supports total and local but default mode only supports one. Should it be allowed
>>>>>      to switch modes? If so, user can then still read from both files, the check whether assignable
>>>>>      counters is enabled will fail and resctrl will attempt to read both via the counter MSRs,
>>>>>      even an unsupported event (continued below).
>>>>> c) Read of event file
>>>>>      A user can read from event file any time even if active mode (default or mbm_event) does
>>>>>      not support it. If mbm_event mode is enabled then resctrl will attempt to use counters,
>>>>>      if default mode is enabled then resctrl will attempt to use MSRs.
>>>>>      This currently entirely depends on whether mbm_event mode is enabled or not.
>>>>>      Perhaps we should add checks here to prevent user from reading an event if the
>>>>>      active mode does not support it? Alternatively prevent user from switching to a mode
>>>>>      that cannot be supported.
>>>>>
>>>>> Look forward to how you view things and thoughts on how user may expect to interact with these
>>>>> features.
>>>
>>>
>>> Yea.  Taken note of all your points. Sorry for the Iate response. I was investigating on how to fix in a proper way.
>>>
>>>
>>>> I am concerned about this issue. The original changelog only mentions that events are enabled when
>>>> they should not be but it looks to me that there is a more serious issue if the user then attempts
>>>> to read from such an event. Have you tried the scenario when a user boots with the parameters
>>>> mentioned in changelog (rdt=!mbmtotal,!mbmlocal) and then attempts to read one of these events?
>>>> Reading from the event will attempt to access its architectural state but from what I can tell
>>>> that will not be allocated since the events are not enabled at the time of the allocation.
>>>
>>>
>>> Yes. I saw the issues. It fails to mount in my case with panic trace.
> 
> (Just to ensure that there is not anything else going on) Could you please confirm if the panic is from
> mon_add_all_files()->mon_event_read()->mon_event_count()->__mon_event_count()->resctrl_arch_reset_rmid()
> that creates the MBM event files during mount and then does the initial read of RMID to determine the
> starting count?

It happens just before that (at mbm_cntr_get). We have not allocated 
d->cntr_cfg for the counters.
===================Panic trace =================================

349.330416] BUG: kernel NULL pointer dereference, address: 0000000000000008
[  349.338187] #PF: supervisor read access in kernel mode
[  349.343914] #PF: error_code(0x0000) - not-present page
[  349.349644] PGD 10419f067 P4D 0
[  349.353241] Oops: Oops: 0000 [#1] SMP NOPTI
[  349.357905] CPU: 45 UID: 0 PID: 3449 Comm: mount Not tainted 
6.18.0-rc1+ #120 PREEMPT(voluntary)
[  349.367803] Hardware name: AMD Corporation PURICO/PURICO, BIOS 
RPUT1003E 12/11/2024
[  349.376334] RIP: 0010:mbm_cntr_get+0x56/0x90
[  349.381096] Code: 45 8d 41 fe 83 f8 01 77 3d 8b 7b 50 85 ff 7e 36 49 
8b 84 24 f0 04 00 00 45 31 c0 eb 0d 41 83 c0 01 48 83 c0 10 44 39 c7 74 
1c <48> 3b 50 08 75 ed 3b 08 75 e9 48 83 c4 10 44 89 c0 5b 41 5c 41 5d
[  349.402037] RSP: 0018:ff56bba58655f958 EFLAGS: 00010246
[  349.407861] RAX: 0000000000000000 RBX: ffffffff9525b900 RCX: 
0000000000000002
[  349.415818] RDX: ffffffff95d526a0 RSI: ff1f5d52517c1800 RDI: 
0000000000000020
[  349.423774] RBP: ff56bba58655f980 R08: 0000000000000000 R09: 
0000000000000001
[  349.431730] R10: ff1f5d52c616a6f0 R11: fffc6a2f046c3980 R12: 
ff1f5d52517c1800
[  349.439687] R13: 0000000000000001 R14: ffffffff95d526a0 R15: 
ffffffff9525b968
[  349.447635] FS:  00007f17926b7800(0000) GS:ff1f5d59d45ff000(0000) 
knlGS:0000000000000000
[  349.456659] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  349.463064] CR2: 0000000000000008 CR3: 0000000147afe002 CR4: 
0000000000771ef0
[  349.471022] PKRU: 55555554
[  349.474033] Call Trace:
[  349.476755]  <TASK>
[  349.479091]  ? kernfs_add_one+0x114/0x170
[  349.483560]  rdtgroup_assign_cntr_event+0x9b/0xd0
[  349.488795]  rdtgroup_assign_cntrs+0xab/0xb0
[  349.493553]  rdt_get_tree+0x4be/0x770
[  349.497623]  vfs_get_tree+0x2e/0xf0
[  349.501508]  fc_mount+0x18/0x90
[  349.505007]  path_mount+0x360/0xc50
[  349.508884]  ? putname+0x68/0x80
[  349.512479]  __x64_sys_mount+0x124/0x150
[  349.516848]  x64_sys_call+0x2133/0x2190
[  349.521123]  do_syscall_64+0x74/0x970

==================================================================

> 
> 
>>>
>>>
>>>>
>>>> This needs to be fixed during this cycle. A week has passed since my previous message so I do not
>>>
>>>
>>> Yes. I understand your concern.
>>>
>>>
>>>> think that it will be possible to create a full featured solution that keeps X86_FEATURE_ABMC
>>>> and X86_FEATURE_CQM_MBM_TOTAL/X86_FEATURE_CQM_MBM_LOCAL independent.
>>>
>>>
>>> Agree.
>>>
>>>
>>>>
>>>> What do you think of something like below that builds on your original change and additionally
>>>> enforces dependency between these features to support the resctrl assumptions? From what I understand
>>>> this is ok for current AMD hardware? A not-as-urgent follow-up can make these features independent
>>>> again?
>>>
>>>
>>> Yes. I tested it. Works fine.  It defaults to "default" mode if both the events(local and total) are disabled in kernel parameter. That is expected.
> 
> Thank you very much for considering it and trying it out. Could you please also check if it
> behaves sanely when just one of the MBM events is enabled? For example by just booting with
> "rdt=!mbmtotal" or "rdt=!mbmlocal". Only one event's file should be created while it should
> still be possible to switch between default and mbm_event mode, event reads from the event
> file working as expected in both modes.

Yes. Checked already. Going to check again running few more tests.

> 
> 
>>>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>>>> index c8945610d455..fd42fe7b2fdc 100644
>>>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>>>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>>>> @@ -452,7 +452,16 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r)
>>>>            r->mon.mbm_cfg_mask = ecx & MAX_EVT_CONFIG_BITS;
>>>>        }
>>>>    -    if (rdt_cpu_has(X86_FEATURE_ABMC)) {
>>>> +    /*
>>>> +     * resctrl assumes a system that supports assignable counters can
>>>> +     * switch to "default" mode. Ensure that there is a "default" mode
>>>> +     * to switch to. This enforces a dependency between the independent
>>>> +     * X86_FEATURE_ABMC and X86_FEATURE_CQM_MBM_TOTAL/X86_FEATURE_CQM_MBM_LOCAL
>>>> +     * hardware features.
>>>> +     */
>>>> +    if (rdt_cpu_has(X86_FEATURE_ABMC) &&
>>>> +        (rdt_cpu_has(X86_FEATURE_CQM_MBM_TOTAL) ||
>>>> +         rdt_cpu_has(X86_FEATURE_CQM_MBM_LOCAL))) {
>>>>            r->mon.mbm_cntr_assignable = true;
>>>>            cpuid_count(0x80000020, 5, &eax, &ebx, &ecx, &edx);
>>>>            r->mon.num_mbm_cntrs = (ebx & GENMASK(15, 0)) + 1;
>>>> diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
>>>> index 4076336fbba6..572a9925bd6c 100644
>>>> --- a/fs/resctrl/monitor.c
>>>> +++ b/fs/resctrl/monitor.c
>>>> @@ -1782,15 +1782,13 @@ int resctrl_mon_resource_init(void)
>>>>            mba_mbps_default_event = QOS_L3_MBM_TOTAL_EVENT_ID;
>>>>          if (r->mon.mbm_cntr_assignable) {
>>>> -        if (!resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
>>>> - resctrl_enable_mon_event(QOS_L3_MBM_TOTAL_EVENT_ID);
>>>> -        if (!resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
>>>> - resctrl_enable_mon_event(QOS_L3_MBM_LOCAL_EVENT_ID);
>>>> -        mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID].evt_cfg = r->mon.mbm_cfg_mask;
>>>> -        mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID].evt_cfg = r->mon.mbm_cfg_mask &
>>>> -                                   (READS_TO_LOCAL_MEM |
>>>> -                                    READS_TO_LOCAL_S_MEM |
>>>> - NON_TEMP_WRITE_TO_LOCAL_MEM);
>>>> +        if (resctrl_is_mon_event_enabled(QOS_L3_MBM_TOTAL_EVENT_ID))
>>>> +            mon_event_all[QOS_L3_MBM_TOTAL_EVENT_ID].evt_cfg = r->mon.mbm_cfg_mask;
>>>> +        if (resctrl_is_mon_event_enabled(QOS_L3_MBM_LOCAL_EVENT_ID))
>>>> +            mon_event_all[QOS_L3_MBM_LOCAL_EVENT_ID].evt_cfg = r->mon.mbm_cfg_mask &
>>>> +                                       (READS_TO_LOCAL_MEM |
>>>> +                                        READS_TO_LOCAL_S_MEM |
>>>> + NON_TEMP_WRITE_TO_LOCAL_MEM);
>>>>            r->mon.mbm_assign_on_mkdir = true;
>>>>            resctrl_file_fflags_init("num_mbm_cntrs",
>>>>                         RFTYPE_MON_INFO | RFTYPE_RES_CACHE);
>>>>
>>>>
>>>>
>>>>
>>
>> I can send the official patch if you are ok to go ahead with the patch.
> 
> I am ok to go ahead with this patch. Please do rewrite the subject and changelog to highlight the
> severity. I'd recommend that the changelog be something like:
> 
> 
> 	The following BUG/PANIC/splat(?) is encountered on mount of resctrl fs after booting
> 	a system that has X86_FEATURE_ABMC with the "rdt=!mbmtotal,!mbmlocal" kernel parameters:
> 
> 	<trimmed backtrace>
> 
> 	<problem description>
> 
> 	<description of fix that also mentions it adds dependency where there is none and why this
> 	 is ok (for now?)>
> 

Yes. Sure.


>>
>> Let me know if I can add Signoff from you or you can respond after it is reviewed.
> 
> You could add below tags or we can just do the usual review. Either works for me. Let me know if
> you would like more collaboration on the changelog.

Sure. Will send you the full change log first.

> 
> Co-developed-by: Reinette Chatre <reinette.chatre@...el.com>
> Signed-off-by: Reinette Chatre <reinette.chatre@...el.com>
> 

thanks
Babu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ