[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0eff9462-e7e2-49a9-9538-c8907024322f@amd.com>
Date: Fri, 22 Nov 2024 12:25:08 -0600
From: "Moger, Babu" <bmoger@....com>
To: Reinette Chatre <reinette.chatre@...el.com>, babu.moger@....com,
corbet@....net, tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com
Cc: fenghua.yu@...el.com, x86@...nel.org, hpa@...or.com, thuth@...hat.com,
paulmck@...nel.org, rostedt@...dmis.org, akpm@...ux-foundation.org,
xiongwei.song@...driver.com, pawan.kumar.gupta@...ux.intel.com,
daniel.sneddon@...ux.intel.com, perry.yuan@....com, sandipan.das@....com,
kai.huang@...el.com, xiaoyao.li@...el.com, seanjc@...gle.com,
jithu.joseph@...el.com, brijesh.singh@....com, xin3.li@...el.com,
ebiggers@...gle.com, andrew.cooper3@...rix.com, mario.limonciello@....com,
james.morse@....com, tan.shaopeng@...itsu.com, tony.luck@...el.com,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
peternewman@...gle.com, maciej.wieczor-retman@...el.com, eranian@...gle.com,
jpoimboe@...nel.org, thomas.lendacky@....com
Subject: Re: [PATCH v9 08/26] x86/resctrl: Introduce the interface to display
monitor mode
Hi Reinette,
On 11/18/2024 4:07 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 11/18/24 11:04 AM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 11/15/24 18:00, Reinette Chatre wrote:
>>> Hi Babu,
>>>
>>> On 10/29/24 4:21 PM, Babu Moger wrote:
>>>> Introduce the interface file "mbm_assign_mode" to list monitor modes
>>>> supported.
>>>>
>>>> The "mbm_cntr_assign" mode provides the option to assign a counter to
>>>> an RMID, event pair and monitor the bandwidth as long as it is assigned.
>>>>
>>>> On AMD systems "mbm_cntr_assign" is backed by the ABMC (Assignable
>>>> Bandwidth Monitoring Counters) hardware feature and is enabled by default.
>>>>
>>>> The "default" mode is the existing monitoring mode that works without the
>>>> explicit counter assignment, instead relying on dynamic counter assignment
>>>> by hardware that may result in hardware not dedicating a counter resulting
>>>> in monitoring data reads returning "Unavailable".
>>>>
>>>> Provide an interface to display the monitor mode on the system.
>>>> $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
>>>> [mbm_cntr_assign]
>>>> default
>>>>
>>>> Signed-off-by: Babu Moger <babu.moger@....com>
>>>> ---
>
> ...
>
>>> I'm concerned that users with Intel platforms may want to use the "mbm_cntr_assign" mode
>>> to make the event data "more predictable" and then be concerned when the mode does
>>> not exist.
>>>
>>> As an alternative, is it possible to know the number of hardware counters on AMD systems
>>> without ABMC? I wonder if we could perhaps always expose num_mbm_cntrs as a way for
>>> users to know if their platform may be impacted by this type of "unpredictability" (by comparing
>>> num_mbm_cntrs to num_rmids).
>>
>> There is some round about(or hacky) way to find that out number of RMIDs
>> that can be active.
>
> Does this give consistent and accurate data? Is this something that can be added to resctrl?
> (Reading your other message [1] it does not sound as though it can produce an accurate
> number on boot.)
> If not then it will be up to the documentation to be accurate.
>
>
>>>> +
>>>> + AMD Platforms with ABMC (Assignable Bandwidth Monitoring Counters) feature
>>>> + enable this mode by default so that counters remain assigned even when the
>>>> + corresponding RMID is not in use by any processor.
>>>> +
>>>> + "default":
>>>> +
>>>> + In default mode resctrl assumes there is a hardware counter for each
>>>> + event within every CTRL_MON and MON group. Reading mbm_total_bytes or
>>>> + mbm_local_bytes may report 'Unavailable' if there is no counter associated
>>>> + with that event.
>>>
>>> If I understand correctly, on AMD platforms without ABMC the events only report
>>> "Unavailable" if there is no counter assigned at the time of the query. If a counter
>>> is unassigned and then reassigned then the event count will reset and the user
>>> will get some data back but it may thus be unpredictable (to match earlier language).
>>> Is this correct? Any AMD platform in "default" mode may thus be vulnerable to
>>> "unpredictable" event counts (not just "Unavailable") ... this gets complicated
>>
>> Yes. All the AMD systems without ABMC are affected by this problem.
>>
>>> because users should be steered to avoid "default" mode if mbm_assign_mode is
>>> available, while not be made concerned to use "default" mode on Intel where
>>> mbm_assign_mode is not available.
>>
>> Can we add text to clarify this?
>
> Please do.
I think we need to add text about AMD systems. How about this?
"default":
In default mode resctrl assumes there is a hardware counter for each
event within every CTRL_MON and MON group. On AMD systems with 16 more
monitoring groups, reading mbm_total_bytes or mbm_local_bytes may report
'Unavailable' if there is no counter associated with that event. It is
therefore recommended to use the 'mbm_cntr_assign' mode, if supported."
>
> Reinette
>
> [1] https://lore.kernel.org/all/35fc70fd-0281-4ac8-b32b-efa2f4516901@amd.com/
>
--
- Babu Moger
Powered by blists - more mailing lists