linux-kernel - Re: [PATCH] fs/resctrl: Fix MBM events being unconditionally enabled in mbm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5163ce35-f843-41a3-abfc-5af91b7c68bc@intel.com>
Date: Tue, 14 Oct 2025 16:09:43 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: "Moger, Babu" <bmoger@....com>, <babu.moger@....com>,
	<tony.luck@...el.com>, <Dave.Martin@....com>, <james.morse@....com>,
	<dave.hansen@...ux.intel.com>, <bp@...en8.de>
CC: <kas@...nel.org>, <rick.p.edgecombe@...el.com>,
	<linux-kernel@...r.kernel.org>, <x86@...nel.org>,
	<linux-coco@...ts.linux.dev>, <kvm@...r.kernel.org>
Subject: Re: [PATCH] fs/resctrl: Fix MBM events being unconditionally enabled
 in mbm_event mode

Hi Babu,

On 10/14/25 3:45 PM, Moger, Babu wrote:
> On 10/14/2025 3:57 PM, Reinette Chatre wrote:
>> On 10/14/25 10:43 AM, Babu Moger wrote:


>>>> Yes. I saw the issues. It fails to mount in my case with panic trace.
>>
>> (Just to ensure that there is not anything else going on) Could you please confirm if the panic is from
>> mon_add_all_files()->mon_event_read()->mon_event_count()->__mon_event_count()->resctrl_arch_reset_rmid()
>> that creates the MBM event files during mount and then does the initial read of RMID to determine the
>> starting count?
> 
> It happens just before that (at mbm_cntr_get). We have not allocated d->cntr_cfg for the counters.
> ===================Panic trace =================================
> 
> 349.330416] BUG: kernel NULL pointer dereference, address: 0000000000000008
> [  349.338187] #PF: supervisor read access in kernel mode
> [  349.343914] #PF: error_code(0x0000) - not-present page
> [  349.349644] PGD 10419f067 P4D 0
> [  349.353241] Oops: Oops: 0000 [#1] SMP NOPTI
> [  349.357905] CPU: 45 UID: 0 PID: 3449 Comm: mount Not tainted 6.18.0-rc1+ #120 PREEMPT(voluntary)
> [  349.367803] Hardware name: AMD Corporation PURICO/PURICO, BIOS RPUT1003E 12/11/2024
> [  349.376334] RIP: 0010:mbm_cntr_get+0x56/0x90
> [  349.381096] Code: 45 8d 41 fe 83 f8 01 77 3d 8b 7b 50 85 ff 7e 36 49 8b 84 24 f0 04 00 00 45 31 c0 eb 0d 41 83 c0 01 48 83 c0 10 44 39 c7 74 1c <48> 3b 50 08 75 ed 3b 08 75 e9 48 83 c4 10 44 89 c0 5b 41 5c 41 5d
> [  349.402037] RSP: 0018:ff56bba58655f958 EFLAGS: 00010246
> [  349.407861] RAX: 0000000000000000 RBX: ffffffff9525b900 RCX: 0000000000000002
> [  349.415818] RDX: ffffffff95d526a0 RSI: ff1f5d52517c1800 RDI: 0000000000000020
> [  349.423774] RBP: ff56bba58655f980 R08: 0000000000000000 R09: 0000000000000001
> [  349.431730] R10: ff1f5d52c616a6f0 R11: fffc6a2f046c3980 R12: ff1f5d52517c1800
> [  349.439687] R13: 0000000000000001 R14: ffffffff95d526a0 R15: ffffffff9525b968
> [  349.447635] FS:  00007f17926b7800(0000) GS:ff1f5d59d45ff000(0000) knlGS:0000000000000000
> [  349.456659] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  349.463064] CR2: 0000000000000008 CR3: 0000000147afe002 CR4: 0000000000771ef0
> [  349.471022] PKRU: 55555554
> [  349.474033] Call Trace:
> [  349.476755]  <TASK>
> [  349.479091]  ? kernfs_add_one+0x114/0x170
> [  349.483560]  rdtgroup_assign_cntr_event+0x9b/0xd0
> [  349.488795]  rdtgroup_assign_cntrs+0xab/0xb0
> [  349.493553]  rdt_get_tree+0x4be/0x770
> [  349.497623]  vfs_get_tree+0x2e/0xf0
> [  349.501508]  fc_mount+0x18/0x90
> [  349.505007]  path_mount+0x360/0xc50
> [  349.508884]  ? putname+0x68/0x80
> [  349.512479]  __x64_sys_mount+0x124/0x150
> [  349.516848]  x64_sys_call+0x2133/0x2190
> [  349.521123]  do_syscall_64+0x74/0x970
> 
> ==================================================================

Thank you for capturing this. This is a different trace but it confirms that it is the
same root cause. Specifically, event is enabled after the state it depends on is (not) allocated
during domain online.

Reinette