lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <33cd0cc0-4f81-4a2d-a327-0c976219996a@amd.com>
Date: Mon, 18 Nov 2024 13:04:45 -0600
From: "Moger, Babu" <babu.moger@....com>
To: Reinette Chatre <reinette.chatre@...el.com>, corbet@....net,
 tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
 dave.hansen@...ux.intel.com
Cc: fenghua.yu@...el.com, x86@...nel.org, hpa@...or.com, thuth@...hat.com,
 paulmck@...nel.org, rostedt@...dmis.org, akpm@...ux-foundation.org,
 xiongwei.song@...driver.com, pawan.kumar.gupta@...ux.intel.com,
 daniel.sneddon@...ux.intel.com, perry.yuan@....com, sandipan.das@....com,
 kai.huang@...el.com, xiaoyao.li@...el.com, seanjc@...gle.com,
 jithu.joseph@...el.com, brijesh.singh@....com, xin3.li@...el.com,
 ebiggers@...gle.com, andrew.cooper3@...rix.com, mario.limonciello@....com,
 james.morse@....com, tan.shaopeng@...itsu.com, tony.luck@...el.com,
 linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
 peternewman@...gle.com, maciej.wieczor-retman@...el.com, eranian@...gle.com,
 jpoimboe@...nel.org, thomas.lendacky@....com
Subject: Re: [PATCH v9 08/26] x86/resctrl: Introduce the interface to display
 monitor mode

Hi Reinette,

On 11/15/24 18:00, Reinette Chatre wrote:
> Hi Babu,
> 
> On 10/29/24 4:21 PM, Babu Moger wrote:
>> Introduce the interface file "mbm_assign_mode" to list monitor modes
>> supported.
>>
>> The "mbm_cntr_assign" mode provides the option to assign a counter to
>> an RMID, event pair and monitor the bandwidth as long as it is assigned.
>>
>> On AMD systems "mbm_cntr_assign" is backed by the ABMC (Assignable
>> Bandwidth Monitoring Counters) hardware feature and is enabled by default.
>>
>> The "default" mode is the existing monitoring mode that works without the
>> explicit counter assignment, instead relying on dynamic counter assignment
>> by hardware that may result in hardware not dedicating a counter resulting
>> in monitoring data reads returning "Unavailable".
>>
>> Provide an interface to display the monitor mode on the system.
>> $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> [mbm_cntr_assign]
>> default
>>
>> Signed-off-by: Babu Moger <babu.moger@....com>
>> ---
>> v9: Updated user documentation based on comments.
>>
>> v8: Commit message update.
>>
>> v7: Updated the descriptions/commit log in resctrl.rst to generic text.
>>     Thanks to James and Reinette.
>>     Rename mbm_mode to mbm_assign_mode.
>>     Introduced mutex lock in rdtgroup_mbm_mode_show().
>>
>> v6: Added documentation for mbm_cntr_assign and legacy mode.
>>     Moved mbm_mode fflags initialization to static initialization.
>>
>> v5: Changed interface name to mbm_mode.
>>     It will be always available even if ABMC feature is not supported.
>>     Added description in resctrl.rst about ABMC mode.
>>     Fixed display abmc and legacy consistantly.
>>
>> v4: Fixed the checks for legacy and abmc mode. Default it ABMC.
>>
>> v3: New patch to display ABMC capability.
>> ---
>>  Documentation/arch/x86/resctrl.rst     | 33 ++++++++++++++++++++++++++
>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 31 ++++++++++++++++++++++++
>>  2 files changed, 64 insertions(+)
>>
>> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
>> index 30586728a4cd..a93d7980e25f 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -257,6 +257,39 @@ with the following files:
>>  	    # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
>>  	    0=0x30;1=0x30;3=0x15;4=0x15
>>  
>> +"mbm_assign_mode":
>> +	Reports the list of monitoring modes supported. The enclosed brackets
>> +	indicate which mode is enabled.
>> +	::
>> +
>> +	  # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>> +	  [mbm_cntr_assign]
>> +	  default
>> +
>> +	"mbm_cntr_assign":
>> +
>> +	In mbm_cntr_assign mode user-space is able to specify which of the
>> +	events in CTRL_MON or MON groups should have a counter assigned using the
>> +	"mbm_assign_control" file. The number of counters available is described
>> +	in the "num_mbm_cntrs" file. Changing the mode may cause all counters on
>> +	a resource to reset.
>> +
>> +	The mode is useful on platforms which support more CTRL_MON and MON
>> +	groups than the hardware counters, meaning 'unassigned' events on CTRL_MON or
> 
> " than the hardware counters" -> " than hardware counters"?

Sure.

> 
>> +	MON groups will report 'Unavailable' or count the traffic in an unpredictable
>> +	way.
> 
> I think the above can be confusing to users. It mentioned "*will* report Unavailable"
> and then "*or* count the traffic in an unpredictable way". It is not possible for
> counter to report "Unavailable" while also reporting unpredictable data.
> 
> My concern is that there is no way for a user to know if the platform supports more
> CTRL_MON and MON groups than hardware counters and the above seems to imply that counters
> may be unreliable ... so how does a user know if counters are unreliable or not?

That is correct. There is no definite way to find out if the counters are
unreliable.

> 
> Can this be made specific to help users know if their platforms are impacted? From
> what I know all AMD platforms are impacted so perhaps a straight-forward:
> 
> 	"The mode is useful on AMD platforms which support more CTRL_MON and MON ..."

Sure.

> 
> I'm concerned that users with Intel platforms may want to use the "mbm_cntr_assign" mode
> to make the event data "more predictable" and then be concerned when the mode does
> not exist.
> 
> As an alternative, is it possible to know the number of hardware counters on AMD systems
> without ABMC? I wonder if we could perhaps always expose num_mbm_cntrs as a way for
> users to know if their platform may be impacted by this type of "unpredictability" (by comparing 
> num_mbm_cntrs to num_rmids).

There is some round about(or hacky) way to find that out number of RMIDs
that can be active.

> 
>> +
>> +	AMD Platforms with ABMC (Assignable Bandwidth Monitoring Counters) feature
>> +	enable this mode by default so that counters remain assigned even when the
>> +	corresponding RMID is not in use by any processor.
>> +
>> +	"default":
>> +
>> +	In default mode resctrl assumes there is a hardware counter for each
>> +	event within every CTRL_MON and MON group. Reading mbm_total_bytes or
>> +	mbm_local_bytes may report 'Unavailable' if there is no counter associated
>> +	with that event.
> 
> If I understand correctly, on AMD platforms without ABMC the events only report
> "Unavailable" if there is no counter assigned at the time of the query. If a counter
> is unassigned and then reassigned then the event count will reset and the user
> will get some data back but it may thus be unpredictable (to match earlier language).
> Is this correct? Any AMD platform in "default" mode may thus be vulnerable to
> "unpredictable" event counts (not just "Unavailable") ... this gets complicated

Yes. All the AMD systems without ABMC are affected by this problem.

> because users should be steered to avoid "default" mode if mbm_assign_mode is
> available, while not be made concerned to use "default" mode on Intel where
> mbm_assign_mode is not available.

Can we add text to clarify this?

> 
> Reinette
> 
> 

-- 
Thanks
Babu Moger

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ