[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <136d9c83-e816-4188-ae0d-322478a57a68@intel.com>
Date: Fri, 14 Nov 2025 14:17:53 -0800
From: Reinette Chatre <reinette.chatre@...el.com>
To: Dave Martin <Dave.Martin@....com>, <linux-kernel@...r.kernel.org>
CC: Tony Luck <tony.luck@...el.com>, James Morse <james.morse@....com>, "Ben
Horgan" <ben.horgan@....com>, Thomas Gleixner <tglx@...utronix.de>, "Ingo
Molnar" <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, Dave Hansen
<dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>, "Jonathan
Corbet" <corbet@....net>, <x86@...nel.org>, <linux-doc@...r.kernel.org>
Subject: Re: [PATCH v2] x86,fs/resctrl: Factor MBA parse-time conversion to be
per-arch
Hi Dave,
On 10/31/25 8:41 AM, Dave Martin wrote:
> The control value parser for the MB resource currently coerces the
> memory bandwidth percentage value from userspace to be an exact
> multiple of the rdt_resource::resctrl_membw::bw_gran parameter.
>
> On MPAM systems, this results in somewhat worse-than-worst-case
> rounding, since the bandwidth granularity advertised to resctrl by the
> MPAM driver is in general only an approximation to the actual hardware
> granularity on these systems, and the hardware bandwidth allocation
> control value is not natively a percentage -- necessitating a further
> conversion in the resctrl_arch_update_domains() path, regardless of the
> conversion done at parse time.
>
> Allow the arch to provide its own parse-time conversion that is
> appropriate for the hardware, and move the existing conversion to x86.
> This will avoid accumulated error from rounding the value twice on MPAM
> systems.
>
> Clarify the documentation, but avoid overly exact promises.
>
> Clamping to bw_min and bw_max still feels generic: leave it in the core
> code, for now.
I think they are only theoretically generic since arch sets them and resctrl
uses to enforce user input. Arch can thus theoretically set them to whatever
the u32 used to represent it allows. Of course, doing something like this makes
the interface even harder for users to use.
>
> No functional change.
>
> Signed-off-by: Dave Martin <Dave.Martin@....com>
>
> ---
...
> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
> index b7f35b07876a..fbbcf5421346 100644
> --- a/Documentation/filesystems/resctrl.rst
> +++ b/Documentation/filesystems/resctrl.rst
> @@ -144,12 +144,11 @@ with respect to allocation:
> user can request.
>
> "bandwidth_gran":
> - The granularity in which the memory bandwidth
> - percentage is allocated. The allocated
> - b/w percentage is rounded off to the next
> - control step available on the hardware. The
> - available bandwidth control steps are:
> - min_bandwidth + N * bandwidth_gran.
> + The approximate granularity in which the memory bandwidth
> + percentage is allocated. The allocated bandwidth percentage
> + is rounded up to the next control step available on the
> + hardware. The available hardware steps are no larger than
> + this value.
>
> "delay_linear":
> Indicates if the delay scale is linear or
> @@ -737,8 +736,10 @@ The minimum bandwidth percentage value for each cpu model is predefined
> and can be looked up through "info/MB/min_bandwidth". The bandwidth
> granularity that is allocated is also dependent on the cpu model and can
> be looked up at "info/MB/bandwidth_gran". The available bandwidth
> -control steps are: min_bw + N * bw_gran. Intermediate values are rounded
> -to the next control step available on the hardware.
> +control steps are, approximately, min_bw + N * bw_gran. The steps may
> +appear irregular due to rounding to an exact percentage: bw_gran is the
> +maximum interval between the percentage values corresponding to any two
> +adjacent steps in the hardware.
What can bw_gran be expected to be on Arm systems? Could existing usage be supported with
MPAM setting bw_gran to 1?
What will these control steps actually look like when the user views the schemata file
on an Arm system?
With resctrl "coercing" the user provided value before providing it to the architecture
it controls these control steps to match what the documentation states above. If resctrl
instead provides the value directly to the architecture I see nothing preventing the
architecture from ignoring resctrl's "contract" with user space documented above and
using arbitrary control steps since it also controls resctrl_arch_get_config() that is
displayed directly to user space. What guarantee is there that resctrl_arch_get_config()
will display a value that is "approximately" min_bw + N * bw_gran? This seems like opening
the door even wider for resctrl to become architecture specific ... with this change the
schemata file becomes a direct channel between user space and the arch that risks users
needing to tread carefully when switching between different architectures.
Reinette
Powered by blists - more mailing lists