linux-kernel - [Bug] x86/resctrl: unexpect mbm_local_bytes/mbm_total_bytes delta on AMD with multiple RMIDs in the same domain

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHCEFEyd0Y+wTrLWNMUNvwgJrCxAi66D17w3Zg-ikH5005k1-w@mail.gmail.com>
Date: Tue, 29 Jul 2025 15:53:27 +0800
From: Hc Zheng <zhenghc00@...il.com>
To: Fenghua Yu <fenghua.yu@...el.com>, Reinette Chatre <reinette.chatre@...el.com>, 
	Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, 
	Dave Hansen <dave.hansen@...ux.intel.com>
Cc: x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org
Subject: [Bug] x86/resctrl: unexpect mbm_local_bytes/mbm_total_bytes delta on
 AMD with multiple RMIDs in the same domain

Hi All,

We have enable resctrl on container platform. We notice some unexpect
behaviors when multiple containers running in the same L3 domain.
the  mbm_local_bytes/mbm_total_bytes for such mon_groups return
Unavailable or delta with two consecutive reads is out of normal range
(eg: 1000+GB/s)

after reading the AMD pqos manual(), it says
"""
Potential causes of the “U” bit being set include
(but are not limited to):

• RMID is not currently tracked by the hardware.
• RMID was not tracked by the hardware at some time since it was last read.
• RMID has not been read since it started being tracked by the hardware.
"""

but no explanations for unexpect large delta between 2 reads of the
counters. After exam the kernel code, I suspect this would more likely
to be a hardware bugs

here are the steps to reproduce it

1. create mon_groups

$ for i in `seq 0 99`;do mkdir -p /sys/fs/resctrl/amdtest/mon_groups/test$i;done

2. run stress command and assigned such pid to each mon_groups , (I
have run such test on AMD Genoa. cpu 16-23,208-215 is on CCD 8)

$ cat stress.sh
nohup numactl -C 16-23,208-215 stress -m  1 --vm-hang 1 > /dev/null &
lastPid=$!
echo $lastPid > /sys/fs/resctrl/amdtest/tasks
echo $lastPid > /sys/fs/resctrl/amdtest/mon_groups/test$1/tasks
$ for i in `seq 0 99`;do bash stress.sh $i ;done

3. watch the resctrl counter every 10 seconds

$ while true ;do cat
/sys/fs/resctrl/amdtest/mon_groups/test9/mon_data/mon_L3_08/mbm_local_bytes;sleep
10;done

...
Unavailable
Unavailable
Unavailable
61924495182825856
64176294690029568
Unavailable
Unavailable
Unavailable
...

at some point the delta for 2 consecutive reads is out of normal
range,  (64176294690029568 - 61924495182825856) / 1024 / 1024 / 1024 /
10 =  209715 Gb/s

if I lower the concurrecy to like 59 or lower, the delta is in normal
range, and never return Unavailable. I have also tested on amd Rome
cpu, the problem still existed.
I have try this on intel platform, It does not have such problem, with
even over 200+ RMIDs concurrently being monitored.

I can not find any documents about max RMID for AMD hardware can
concurrently holds, or a explanations for such problems.
I believe this could become even severe on AMD with more threads in
the future, as we will run more workloads on a single server

Can some one help me to solve this problem, thanks

Best Regards
Huaicheng Zheng