[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5b6cfa6e-9f2f-4d84-9b63-14386a2fa3dc@arm.com>
Date: Fri, 21 Feb 2025 18:06:03 +0000
From: James Morse <james.morse@....com>
To: Babu Moger <babu.moger@....com>, corbet@....net,
reinette.chatre@...el.com, tglx@...utronix.de, mingo@...hat.com,
bp@...en8.de, dave.hansen@...ux.intel.com, tony.luck@...el.com,
peternewman@...gle.com
Cc: fenghua.yu@...el.com, x86@...nel.org, hpa@...or.com, paulmck@...nel.org,
akpm@...ux-foundation.org, thuth@...hat.com, rostedt@...dmis.org,
xiongwei.song@...driver.com, pawan.kumar.gupta@...ux.intel.com,
daniel.sneddon@...ux.intel.com, jpoimboe@...nel.org, perry.yuan@....com,
sandipan.das@....com, kai.huang@...el.com, xiaoyao.li@...el.com,
seanjc@...gle.com, xin3.li@...el.com, andrew.cooper3@...rix.com,
ebiggers@...gle.com, mario.limonciello@....com, tan.shaopeng@...itsu.com,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
maciej.wieczor-retman@...el.com, eranian@...gle.com
Subject: Re: [PATCH v11 07/23] x86/resctrl: Introduce the interface to display
monitor mode
Hi Babu,
On 22/01/2025 20:20, Babu Moger wrote:
> Introduce the interface file "mbm_assign_mode" to list monitor modes
> supported.
>
> The "mbm_cntr_assign" mode provides the option to assign a counter to
> an RMID, event pair and monitor the bandwidth as long as it is assigned.
>
> On AMD systems "mbm_cntr_assign" is backed by the ABMC (Assignable
> Bandwidth Monitoring Counters) hardware feature and is enabled by default.
>
> The "default" mode is the existing monitoring mode that works without the
> explicit counter assignment, instead relying on dynamic counter assignment
> by hardware that may result in hardware not dedicating a counter resulting
> in monitoring data reads returning "Unavailable".
>
> Provide an interface to display the monitor mode on the system.
> $ cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> [mbm_cntr_assign]
> default
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index fb90f08e564e..b5defc5bce0e 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -257,6 +257,32 @@ with the following files:
> # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
> 0=0x30;1=0x30;3=0x15;4=0x15
>
> +"mbm_assign_mode":
> + Reports the list of monitoring modes supported. The enclosed brackets
> + indicate which mode is enabled.
> + ::
> +
> + # cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> + [mbm_cntr_assign]
> + default
> +
> + "mbm_cntr_assign":
> +
> + In mbm_cntr_assign, monitoring event can only accumulate data while
> + it is backed by a hardware counter. The user-space is able to specify
> + which of the events in CTRL_MON or MON groups should have a counter
> + assigned using the "mbm_assign_control" file. The number of counters
> + available is described in the "num_mbm_cntrs" file. Changing the mode
> + may cause all counters on a resource to reset.
> + "default":
> +
> + In default mode, resctrl assumes there is a hardware counter for each
> + event within every CTRL_MON and MON group. On AMD platforms, it is
> + recommended to use mbm_cntr_assign mode if supported, because reading
> + "mbm_total_bytes" or "mbm_local_bytes" will report 'Unavailable' if
> + there is no counter associated with that event.
But if you read a value instead of "Unavailable", that doesn't mean the value is correct.
For two reads that succeed, the counter may have been reset in the middle.
I'm suggesting something like:
| it is recommended to use mbm_cntr_assign mode if supported, to avoid counters
| being re-allocated by hardware. This can cause a misleading value to be read,
| or if no counter is associated with that event "Unavailable".
> "max_threshold_occupancy":
> Read/write file provides the largest value (in
> bytes) at which a previously used LLC_occupancy
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index f91fe605766f..3880480a41d2 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -854,6 +854,30 @@ static int rdtgroup_rmid_show(struct kernfs_open_file *of,
> return ret;
> }
>
> +static int resctrl_mbm_assign_mode_show(struct kernfs_open_file *of,
> + struct seq_file *s, void *v)
> +{
> + struct rdt_resource *r = of->kn->parent->priv;
> +
> + mutex_lock(&rdtgroup_mutex);
> +
> + if (r->mon.mbm_cntr_assignable) {
> + if (resctrl_arch_mbm_cntr_assign_enabled(r)) {
> + seq_puts(s, "[mbm_cntr_assign]\n");
> + seq_puts(s, "default\n");
> + } else {
> + seq_puts(s, "mbm_cntr_assign\n");
> + seq_puts(s, "[default]\n");
> + }
What do you think to an architecture being able to opt-out of this flexibility?
If there aren't enough counters I can expose what the hardware has through this interface
- but if user-space turns it off ... then what?
For MPAM this would need to be some best-effort software allocation strategy that I'd
rather not write - its not a problem that can be solved, and any value that is reported is
likely to be wrong. For ABMC platforms, existing stable kernels expose a value, so being
able to preserve the existing behaviour makes sense. MPAM doesn't have this problem.
Something like this:
----------%<----------
@@ -861,16 +861,21 @@ static int resctrl_mbm_assign_mode_show(struct kernfs_open
_file *of,
struct seq_file *s, void *v)
{
struct rdt_resource *r = of->kn->parent->priv;
+ bool enabled = resctrl_arch_mbm_cntr_assign_enabled(r);
mutex_lock(&rdtgroup_mutex);
if (r->mon.mbm_cntr_assignable) {
- if (resctrl_arch_mbm_cntr_assign_enabled(r)) {
+ if (enabled)
seq_puts(s, "[mbm_cntr_assign]\n");
- seq_puts(s, "default\n");
- } else {
- seq_puts(s, "mbm_cntr_assign\n");
+ else
seq_puts(s, "[default]\n");
+
+ if (!IS_ENABLED(CONFIG_RESCTRL_ASSIGN_FIXED) {
+ if (enabled)
+ seq_puts(s, "default\n");
+ else
+ seq_puts(s, "mbm_cntr_assign\n");
}
} else {
seq_puts(s, "[default]\n");
----------%<----------
x86 wouldn't define CONFIG_RESCTRL_ASSIGN_FIXED, arm64 would, meaning for MPAM the file
would be either:
| [default]
or
| [mbm_cntr_assign]
> + } else {
> + seq_puts(s, "[default]\n");
> + }
> +
> + mutex_unlock(&rdtgroup_mutex);
> +
> + return 0;
> +}
Thanks,
James
Powered by blists - more mailing lists