lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <71e85bf3-a451-4adf-ad5e-d39f7935efa0@amd.com>
Date:   Wed, 6 Dec 2023 09:40:58 -0600
From:   "Moger, Babu" <babu.moger@....com>
To:     Reinette Chatre <reinette.chatre@...el.com>, corbet@....net,
        fenghua.yu@...el.com, tglx@...utronix.de, mingo@...hat.com,
        bp@...en8.de, dave.hansen@...ux.intel.com,
        James Morse <james.morse@....com>
Cc:     x86@...nel.org, hpa@...or.com, paulmck@...nel.org,
        rdunlap@...radead.org, tj@...nel.org, peterz@...radead.org,
        seanjc@...gle.com, kim.phillips@....com, jmattson@...gle.com,
        ilpo.jarvinen@...ux.intel.com, jithu.joseph@...el.com,
        kan.liang@...ux.intel.com, nikunj@....com,
        daniel.sneddon@...ux.intel.com, pbonzini@...hat.com,
        rick.p.edgecombe@...el.com, rppt@...nel.org,
        maciej.wieczor-retman@...el.com, linux-doc@...r.kernel.org,
        linux-kernel@...r.kernel.org, eranian@...gle.com,
        peternewman@...gle.com, dhagiani@....com
Subject: Re: [PATCH 00/15] x86/resctrl : Support AMD QoS RMID Pinning feature

Hi Reinette,

On 12/5/23 17:17, Reinette Chatre wrote:
> (+James)
> 
> Hi Babu,
> 
> On 11/30/2023 4:57 PM, Babu Moger wrote:
>> These series adds the support for AMD QoS RMID Pinning feature. It is also
> 
> "These series" - is this series part of a bigger work?

No.
There are some some plans to optimize rmid_reads. Peter is planning to
work on that. But both are independent of each other.

> 
>> called ABMC (Assignable Bandwidth Monitoring Counters) feature.
>>
>> The feature details are available in APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC). The documentation is available at
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>>
>> The patches are based on top of commit 
>> 346887b65d89ae987698bc1efd8e5536bd180b3f (tip/master)
>>
>> # Introduction
>>
>> AMD hardware can support 256 or more RMIDs. However, bandwidth monitoring
>> feature only guarantees that RMIDs currently assigned to a processor will
>> be tracked by hardware. The counters of any other RMIDs which are no
>> longer being tracked will be reset to zero. The MBM event counters return
>> "Unavailable" for the RMIDs that are not active.
>>
>> Users can create 256 or more monitor groups. But there can be only limited
>> number of groups that can be give guaranteed monitoring numbers. With ever
>> changing system configuration, there is no way to definitely know which of
>> these groups will be active for certain point of time. Users do not have
>> the option to monitor a group or set of groups for certain period of time
>> without worrying about RMID being reset in between.
>>
>> The ABMC feature provides an option to pin (or assign) the RMID to the
>> hardware counter and monitor the bandwidth for a longer duration. The
>> pinned RMID will be active until the user unpins (or unassigns) it.  There
>> is no need to worry about counters being reset during this period.
>> Additionally, the user can specify a bitmask identifying the specific
>> bandwidth types from the given source to track with the counter.
>>
>> # Linux Implementation
>>
>> Hardware provides total of 32 counters available for assignment.
>> Each Linux resctrl group can be assigned a maximum of 2 counters. One for
>> mbm_total_bytes and one for mbm_local_bytes. Users also have the option to
>> assign only one counter to the group. If the system runs out of assignable
>> counters, the kernel will display the error when the user attempts a new
>> counter assignment. Users need to unassign already used counters for new
>> assignments.
>>
>> # Examples
>>
>> a. Check if ABMC support is available
>> 	#mount -t resctrl resctrl /sys/fs/resctrl/
>> 	#cat /sys/fs/resctrl/info/L3_MON/mon_features 
>> 	llc_occupancy
>> 	mbm_total_bytes
>> 	mbm_total_bytes_config
>> 	mbm_local_bytes
>> 	mbm_local_bytes_config
>> 	abmc_capable ←  Linux kernel detected ABMC feature.
> 
> (Please start thinking about a new name that is not the AMD feature
> name. This is added to resctrl filesystem that is the generic interface
> used to work with different architectures. This thus needs to be generalized
> to what user requires and how it can be accommodated by the hardware ...
> this is already expected to be needed by MPAM and having a AMD feature
> name could cause confusion.)

Yes. Agree.

How about "assign_capable"?

> 
>>
>> b. Mount with ABMC support
>> 	#umount /sys/fs/resctrl/
>> 	#mount  -o abmc -t resctrl resctrl /sys/fs/resctrl/
>> 	
> 
> hmmm ... so this requires the user to mount resctrl, determine if the
> feature is supported, unmount resctrl, remount resctrl with feature enabled.
> Could you please elaborate what prevents this feature from being enabled
> without needing to remount resctrl?

Spec says
"Enabling ABMC: ABMC is enabled by setting L3_QOS_EXT_CFG.ABMC_En=1 (see
Figure 19-7). When the state of ABMC_En is changed, it must be changed to
the updated value on all logical processors in the QOS Domain.
Upon transitions of the ABMC_En the following actions take place:
All ABMC assignable bandwidth counters are reset to 0.
The L3 default mode bandwidth counters are reset to 0.
The L3_QOS_ABMC_CFG MSR is reset to 0."

So, all the monitoring group counters will be reset.

It is technically possible to enable without remount. But ABMC mode
requires few new files(in each group) which I added when mounted with "-o
abmc". Thought it is a better option.

Otherwise we need to add these files when ABMC is supported(not when
enabled). Need to add another file in /sys/fs/resctrl/info/L3_MON to
enable the feature on the fly.

Both are acceptable options. Any thoughts?


> 
>> c. Read the monitor states. There will be new file "monitor_state"
>>    for each monitor group when ABMC feature is enabled. By default,
>>    both total and local MBM events are in "unassign" state.
>> 	
>> 	#cat /sys/fs/resctrl/monitor_state 
>> 	total=unassign;local=unassign
>> 	
>> d. Read the event mbm_total_bytes and mbm_local_bytes. Note that MBA
>>    events are not available until the user assigns the events explicitly.
>>    Users need to assign the counters to monitor the events in this mode.
>> 	
>> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>> 	Unavailable
> 
> How is the llc_occupancy event impacted when ABMC is enabled? Can all RMIDs
> still be used to track cache occupancy?

llc_occupancy event is not impacted by ABMC mode. It can be still used to
track cache occupancy.

> 
>> 	
>> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes 
>> 	Unavailable
> 
> I believe that "Unavailable" already has an accepted meaning within current
> interface and is associated with temporary failure. Even the AMD spec states "This
> is generally a temporary condition and subsequent reads may succeed". In the
> scenario above there is no chance that this counter would produce a value later.
> I do not think it is ideal to overload existing interface with different meanings
> associated with a new hardware specific feature ... something like "Disabled" seems
> more appropriate.

Hardware still reports it as unavailable. Also, there are some error cases
hardware can report unavailable. We may not be able to differentiate that.

> 
> Considering this we may even consider using these files themselves as a
> way to enable the counters if they are disabled. For example, just
> "echo 1 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes" can be used

I am not sure about this. This is specific to domain 0. This group can
have cpus from multiple domains. I think we should have the interface for
all the domains(not for specific domain).

> to enable this counter. No need for a new "monitor_state". Please note that this
> is not an official proposal since there are two other use cases that still need to
> be considered as we await James's feedback on how this may work for MPAM and
> also how this may be useful on AMD hardware that does not support ABMC but
> users may want to get similar benefits ([1])

Ok. Lets wait for James comments.
> 
>> 	
>> e. Assign a h/w counter to the total event and read the monitor_state.
>> 	
>> 	#echo total=assign > /sys/fs/resctrl/monitor_state
>> 	#cat /sys/fs/resctrl/monitor_state 
>> 	total=assign;local=unassign
>> 	
>> f. Now that the total event is assigned. Read the total event.
>> 	
>> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>> 	6136000
>> 	
>> g. Assign a h/w counter to both total and local events and read the monitor_state.
>> 	
>> 	#echo "total=assign;local=assign" > /sys/fs/resctrl/monitor_state
>> 	#cat /sys/fs/resctrl/monitor_state
>> 	total=assign;local=assign
>> 	
>> h. Now that both total and local events are  assigned, read the events.
>> 	
>> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>> 	6136000
>> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>> 	58694
> 
> It looks like if not all RMIDs asssociated with parent and child groups
> have counters then the accumulated counters would just treat the "unassigned"
> as zero?

That is correct.

> 
>> 	
>> i. Check the bandwidth configuration for the group. Note that bandwidth
>>    configuration has a domain scope. Total event defaults to 0x7F (to
>>    count all the events) and local event defaults to 0x15
>>    (to count all the local numa events). The event bitmap decoding is
>>    available in https://www.kernel.org/doc/Documentation/x86/resctrl.rst
>>    in section "mbm_total_bytes_config", "mbm_local_bytes_config":
>> 	
>> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
>> 	0=0x7f;1=0x7f
>> 	
>> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config 
>> 	0=0x15;1=0xi15
> 
> 
> These would not be available if system does not support BMEC. From
> patch #3 it does not seem as though ABMC is dependent on BMEC.
> 
> Is ABMC dependent on BMEC or are they just using the same
> config bits?

Good question. They dont have to be dependent on each other. To keep the
rmid_read interface same, we made it dependent on each other. I will add
the dependency in patch 3.

I have added explanation in patch 15.
https://lore.kernel.org/lkml/20231201005720.235639-16-babu.moger@amd.com/


> 
>> 	
>> j. Change the bandwidth source for domain 0 for the total event to count only reads.
>>    Note that this change effects events on the domain 0.
>> 	
>> 	#echo total=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
> 
> typo?

Yes. Cut paste mistake. Will fix it.

> 
>> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
>> 	0=0x33;1=0x7F
>> 	
>> k. Now read the total event again. The mbm_total_bytes should display
>>    only the read events.
>> 	
>> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>> 	6136000
> 
> hmmm ... seems like there is a need to make the MBM events configurable even
> if BMEC is not supported.

Yes, in ABMC mode. Will add the dependency. Will use the standard mode if
BMEC and ABMC  are not available.

> 
> Reinette
> 
> 
> [1] https://lore.kernel.org/lkml/CALPaoCjg-W3w8OKLHP_g6Evoo03fbgaOQZrGTLX6vdSLp70=SA@mail.gmail.com/

-- 
Thanks
Babu Moger

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ