[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ffe1c6e9-e951-42d6-99ec-ec6e7496f9b0@intel.com>
Date: Fri, 26 Sep 2025 13:54:10 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Dave Martin <Dave.Martin@....com>, <linux-kernel@...r.kernel.org>
CC: Tony Luck <tony.luck@...el.com>, James Morse <james.morse@....com>,
"Thomas Gleixner" <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
"Borislav Petkov" <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
"H. Peter Anvin" <hpa@...or.com>, Jonathan Corbet <corbet@....net>,
<x86@...nel.org>, <linux-doc@...r.kernel.org>
Subject: Re: [PATCH] fs/resctrl,x86/resctrl: Factor mba rounding to be
per-arch
Hi Dave,
Just one correction ...
On 9/22/25 8:04 AM, Dave Martin wrote:
> On Fri, Sep 12, 2025 at 03:19:04PM -0700, Reinette Chatre wrote:
>
> [...]
>
>>> Clamping to bw_min and bw_max still feels generic: leave it in the core
>>> code, for now.
>>
>> Sounds like MPAM may be ready to start the schema parsing discussion again?
>> I understand that MPAM has a few more ways to describe memory bandwidth as
>> well as cache portion partitioning. Previously ([1] [2]) James mused about exposing
>> schema format to user space, which seems like a good idea for new schema.
>
> On this topic, specifically:
>
>
> My own ideas in this area are a little different, though I agree with
> the general idea.
>
> Bitmap controls are distinct from numeric values, but for numbers, I'm
> not sure that distinguishing percentages from other values is required,
> since this is really just a specific case of a linear scale.
>
> I imagined a generic numeric schema, described by a set of files like
> the following in a schema's info directory:
>
> min: minimum value, e.g., 1
> max: maximum value, e.g., 1023
> scale: value that corresponds to one unit
> unit: quantified base unit, e.g., "100pc", "64MBps"
> map: mapping function name
>
> If s is the value written in a schemata entry and p is the
> corresponding physical amount of resource, then
>
> min <= s <= max
>
> and
>
> p = map(s / scale) * unit
>
> One reason why I prefer this scaling scheme over the floating-point
> approach is that it can be exact (at least for currently known
> platforms), and it doesn't require a new floating-point parser/
> formatter to be written for this one thing in the kernel (which I
> suspect is likely to be error-prone and poorly defined around
> subtleties such as rounding behaviour).
>
> "map" anticipates non-linear ramps, but this is only really here as a
> forwards compatibility get-out. For now, this might just be set to
> "none", meaning the identity mapping (i.e., a no-op). This may shadow
> the existing the "delay_linear" parameter, but with more general
> applicabillity if we need it.
>
>
> The idea is that userspace reads the info files and then does the
> appropriate conversions itself. This might or might not be seen as a
> burden, but would give exact control over the hardware configuration
> with a generic interface, with possibly greater precision than the
> existing schemata allow (when the hardware supports it), and without
> having to second-guess the rounding that the kernel may or may not do
> on the values.
>
> For RDT MBA, we might have
>
> min: 10
> max: 100
> scale: 100
> unit: 100pc
> map: none
>
> The schemata entry
>
> MB: 0=10, 1=100
>
> would allocate the minimum possible bandwidth to domain 0, and 100%
> bandwidth to domain 1.
>
>
> For AMD SMBA, we might have:
>
> min: 1
> max: 100
> scale: 8
> unit: 1GBps
>
Unfortunately not like this for AMD. Initial support for AMD MBA set max
to a hardcoded 2048 [1] that was later [2] modified to learn max from hardware.
Of course this broke resctrl as a generic interface and I hope we learned
enough since to not repeat this mistake nor give up on MB and make its interface
even worse by, for example, adding more architecture specific input ranges.
Reinette
[1] commit 4d05bf71f157 ("x86/resctrl: Introduce AMD QOS feature")
[2] commit 0976783bb123 ("x86/resctrl: Remove hard-coded memory bandwidth limit")
Powered by blists - more mailing lists