linux-kernel - Re: [Bug] x86/resctrl: unexpect mbm_local_bytes/mbm_total_bytes delta on AMD with multiple RMIDs in the same domain

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <fd9d8b12-97b7-47eb-a26d-54a8148bc57f@amd.com>
Date: Tue, 29 Jul 2025 12:42:05 -0500
From: "Moger, Babu" <babu.moger@....com>
To: Reinette Chatre <reinette.chatre@...el.com>,
 Hc Zheng <zhenghc00@...il.com>, Fenghua Yu <fenghua.yu@...el.com>,
 Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
 Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>
Cc: x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,
 linux-kernel@...r.kernel.org
Subject: Re: [Bug] x86/resctrl: unexpect mbm_local_bytes/mbm_total_bytes delta
 on AMD with multiple RMIDs in the same domain

Hi Hc Zheng,

On 7/29/25 11:49, Reinette Chatre wrote:
> +Babu
> 
> Hi Huaicheng Zheng,
> 
> On 7/29/25 12:53 AM, Hc Zheng wrote:
>> Hi All,
>>
>> We have enable resctrl on container platform. We notice some unexpect
>> behaviors when multiple containers running in the same L3 domain.
>> the  mbm_local_bytes/mbm_total_bytes for such mon_groups return
>> Unavailable or delta with two consecutive reads is out of normal range
>> (eg: 1000+GB/s)
>>
>> after reading the AMD pqos manual(), it says
>> """
>> Potential causes of the “U” bit being set include
>> (but are not limited to):
>>
>> • RMID is not currently tracked by the hardware.
>> • RMID was not tracked by the hardware at some time since it was last read.
>> • RMID has not been read since it started being tracked by the hardware.
>> """
>>
>> but no explanations for unexpect large delta between 2 reads of the
>> counters. After exam the kernel code, I suspect this would more likely
>> to be a hardware bugs
>>
>> here are the steps to reproduce it
>>
>> 1. create mon_groups
>>
>> $ for i in `seq 0 99`;do mkdir -p /sys/fs/resctrl/amdtest/mon_groups/test$i;done

Looks like you are creating 99 new groups here.

You can create more monitor groups,  but hardware cannot count more than
32 RMIDs(or 16 in some old hardware) at a time.


>>
>> 2. run stress command and assigned such pid to each mon_groups , (I
>> have run such test on AMD Genoa. cpu 16-23,208-215 is on CCD 8)
>>
>> $ cat stress.sh
>> nohup numactl -C 16-23,208-215 stress -m  1 --vm-hang 1 > /dev/null &
>> lastPid=$!
>> echo $lastPid > /sys/fs/resctrl/amdtest/tasks
>> echo $lastPid > /sys/fs/resctrl/amdtest/mon_groups/test$1/tasks
>> $ for i in `seq 0 99`;do bash stress.sh $i ;done
>>
>> 3. watch the resctrl counter every 10 seconds
>>
>> $ while true ;do cat
>> /sys/fs/resctrl/amdtest/mon_groups/test9/mon_data/mon_L3_08/mbm_local_bytes;sleep
>> 10;done
>>
>> ...
>> Unavailable
>> Unavailable
>> Unavailable
>> 61924495182825856
>> 64176294690029568
>> Unavailable
>> Unavailable
>> Unavailable
>> ...
>>
>> at some point the delta for 2 consecutive reads is out of normal
>> range,  (64176294690029568 - 61924495182825856) / 1024 / 1024 / 1024 /
>> 10 =  209715 Gb/s
>>
>> if I lower the concurrecy to like 59 or lower, the delta is in normal
>> range, and never return Unavailable. I have also tested on amd Rome
>> cpu, the problem still existed.
>> I have try this on intel platform, It does not have such problem, with
>> even over 200+ RMIDs concurrently being monitored.
>>
>> I can not find any documents about max RMID for AMD hardware can
>> concurrently holds, or a explanations for such problems.
>> I believe this could become even severe on AMD with more threads in
>> the future, as we will run more workloads on a single server
>>
>> Can some one help me to solve this problem, thanks
> 
> It looks to me as though you are encountering the issue that is addressed with AMD's
> Assignable Bandwidth Monitoring Counters (ABMC) feature that Babu is currently enabling
> in resctrl [1]. The feature itself is well documented in that series and includes links to
> the AMD spec where you can learn more.
> You show that the "Unavailable" is encountered when reading these counters from user
> space and I deduce from that that resctrl's internal MBM overflow handler (it runs once
> per second) likely encounters the same error with the consequence that overflows of the
> counter are not handled correctly.

Yea. The huge numbers are due to overflow problem. Kernel assumes there is
an overflow and adds a big number to account for the overflow in a
subsequent reads.

Yes. We are trying to address in the new hardware which is mentioned in [1].
> 
> If you do have access to the AMD hardware with this feature, please do take a look at
> the resctrl support for it and try it out. We would all appreciate your feedback to ensure
> resctrl supports it well.
> 
> Reinette 
> 
> [1] https://lore.kernel.org/lkml/cover.1753467772.git.babu.moger@amd.com/
> 
> 

-- 
Thanks
Babu Moger