[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aQOUAeVP9oc7RIn/@e133380.arm.com>
Date: Thu, 30 Oct 2025 16:36:17 +0000
From: Dave Martin <Dave.Martin@....com>
To: Reinette Chatre <reinette.chatre@...el.com>
Cc: linux-kernel@...r.kernel.org, Tony Luck <tony.luck@...el.com>,
	James Morse <james.morse@....com>,
	"Chen, Yu C" <yu.c.chen@...el.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	"H. Peter Anvin" <hpa@...or.com>, Jonathan Corbet <corbet@....net>,
	x86@...nel.org
Subject: Re: [RFC] fs/resctrl: Generic schema description
Hi Reinette,
On Tue, Oct 28, 2025 at 04:17:05PM -0700, Reinette Chatre wrote:
> Hi Dave,
> 
> On 10/24/25 4:12 AM, Dave Martin wrote:
> > Hi all,
> > 
> > Going forward, a single resctrl resource (such as memory bandwidth) is
> > likely to require multiple schemata, either because we want to add new
> > schemata that provide finer control, or because the hardware has
> > multiple controls, covering different aspects of resource allocation.
> > 
> > The fit between MPAM's memory bandwidth controls and the resctrl MB
> > schema is already awkward, and later Intel RDT features such as Region
> > Aware Memory Bandwidth Allocation are already pushing past what the MB
> > schema can describe.  Both of these can involve multiple control
> > values and finer resolution than the 100 steps offered by the current
> > "MB" schema.
> > 
> > The previous discussion went off in a few different directions [1], so
> > I want to focus back onto defining an extended schema description that
> > aims to cover the use cases that we know about or anticipate today, and
> > allows for future extension as needed.
> > 
> > (A separate discussion is needed on how new schemata interact with
> > previously-defined schemata (such as the MB percentage schema). 
> > suggest we pause that discussion for now, in the interests of getting
> > the schema description nailed down.)
> 
> ok, but let's keep this as "open #1"
> 
> > Following on from the previous mail thread, I've tried to refine and
> > flesh out the proposal for schema descriptions a bit, as follows.
> > 
> > Proposal:
> > 
> >   * Split resource names and schema names in resctrlfs.
> > 
> >     Resources will be named for the unique, existing schema for each
> >     resource.
> 
> Are you referring to the implementation or how things are exposed to user
> space? I am trying to understand how the existing L3CODE/L3DATA schemata
> fit in ... they are presented to user space as two separate resources since
> they each have their own directory in "info" while internally they are 
> schema of the L3 resource.
Good question -- I didn't take into account here the fact that some
physical resources already have multiple schemata exposed to userspace.
I've probably overformalised, here.  I'm not proposing to refactor the
arrangement of existing schemata and resources.	
So we would continue to have
info/L3CODE/resource_schemata/L3CODE/ and
info/L3DATA/resource_schemata/L3DATA/.
I think that the decision to combine these under a single resctrl
resource internally is the most logical one, but I'm proposing just to
extend the info/ content, without unnecssary changes.
The current arrangement does have one shortcoming, which is that
software doesn't know (other than by built-in knowledge) that L3CODE
and L3DATA claim resource from the same hardware pool, so
	L3CODE:0=0001
	L3DATA:0=0001
implies that the transactions on the I-side and D-side contend for
cache lines (unless there are separate L3 I- and D-caches -- but I
don't think that's a thing on any relevant system...)
So, we might want some way to indicate that L3CODE and L3DATA are
linked.  But I think that CDP is a unique case where we can reasonably
expect some built-in userspace knowledge.
I didn't currently plan to address this, but it could come later if we
think it's important.
> Just trying to understand if you are talking about reverting
> https://lore.kernel.org/all/20210728170637.25610-1-james.morse@arm.com/ ?
No...
> The current implementation appears to match this proposal so we may need to
> have special cases to keep CDP backwards compatible.
> 
> SMBA may also need some extra care ... especially if other architectures start
> to allocate memory bandwidth to CXL resource via their "MB" resource.
Perhaps.  I think it may be necessary to hack up and implementation of
these changes, to flush out things that don't quite fit.
>  
> >     The existing schema will keep its name (the same as the resource
> >     name), and new schemata defined for a resource will include that
> >     name as a prefix (at least, by default).
> > 
> >     So, for example, we will have an MB resource with a schema called
> >     MB (the schema that we have already).  But we may go on to define
> >     additional schemata for the MB resource, with names such MB_MAX,
> >     etc.
> > 
> >   * Stop adding new schema description information in the top-level
> >     info/<resource>/ directory in resctrlfs.
> > 
> >     For backwards compatibilty, we can keep the existing property
> >     files under the resource info directory to describe the previously
> >     defined resource, but we seem to need something richer going
> >     forward.
> > 
> >   * Add a hierarchy to list all the schemata for each resource, along
> >     with their properties.  So far, the proposal looks like this,
> >     taking the MB resource as an example:
> > 
> > 	info/
> > 	 └─ MB/
> > 	     └─ resource_schemata/
> > 	         ├─ MB/
> > 	         ├─ MB_MIN/
> > 	         ├─ MB_MAX/
> > 	         ┆
> > 
> >     Here, MB, MB_MIN and MB_MAX are all schemata for the "MB" resource.
> >     In this proposal, what these just dummy schema names for
> >     illustration purposes.  The important thing is that they all
> >     control aspects of the "MB" resource, and that there can be more
> >     than one of them.
> > 
> >     It may be appropriate to have a nested hierarchy, where some
> >     schemata are presented as children of other schemata if they
> >     affect the same hardware controls.  For now, let's put this issue
> >     on one side, and consider what properties should be advertsed for
> >     each schema.
> 
> ok to put this aside but I think we should keep including it, "open #2" ?
Yes; I'm not abandoning this, but I wanted to focus on the schema
description, here.
> >   * Current properties that I think we might want are:
> > 
> > 	info/
> > 	 └─ SOME_RESOURCE/
> > 	     └─ resource_schemata/
> > 	         ├─ SOME_SCHEMA/
> > 	         ┆   ├─ type
> > 	             ├─ min
> > 	             ├─ max
> > 	             ├─ tolerance
> > 	             ├─ resolution
> > 	             ├─ scale
> > 	             └─ unit
> > 
> >     (I've tweaked the properties a bit since previous postings.
> >     "type" replaces "map"; "scale" is now the unit multiplier;
> >     "resolution" is now a scaling divisor -- details below.)
> > 
> >     I assume that we expose the properties in individual files, but we
> >     could also combine them into a single description file per schema,
> >     per resource or (possibly) a single global file.
> >     (I don't have a strong view on the best option.)
> > 
> > 
> >     Either way, the following set of properties may be a reasonable
> >     place to start:
> > 
> > 
> >     type: the schema type, followed by optional flag specifiers:
> > 
> >       - "scalar": a single-valued numeric control
> > 
> >         A mandatory flag indicates how the control value written to
> >         the schemata file is converted to an amount of resource for
> >         hardware regulation.
> > 
> > 	The flag "linear" indicates a linear mapping.
> > 
> > 	In this case, the amount of resource E that is actually
> > 	allocated is derived from the control value C written to the
> > 	schemata file as follows:
> > 
> >     	E = C * scale * unit / resolution
> > 
> > 	Other flags values could be defined later, if we encounter
> > 	hardware with non-linear controls.
> > 
> >       - "bitmap": a bitmap control
> > 
> >         The optional flag "sparse" is present if the control accepts
> >         sparse bitmaps.
> > 
> > 	In this case, E = bitmap_weight(C) * scale * unit / resolution.
> > 
> > 	As before, each bit controls access to a specific chunk of
> > 	resource in the hardware, such as a group of cache lines.  All
> > 	chunks are equally sized.
> > 
> > 	(Different CTRL_MON groups may still contend within the
> > 	allocation E, when they have bits in common between their
> > 	bitmaps.)
> 
> Would it not be simpler to have the files/properties depend on the
> schema type? It almost seems as though some of the properties are forced
> to have some meaning for bitmap when they do not seem to be needed. Instead,
> for a bitmap type there can be bitmap specific properties like, for example,
> bit_usage. This may also create more flexibility when there is a future
> mapping function needed that depends on some new property?
> 
> Reinette
Sure, there is no reason why the set of properties has to be identical
for different schema types.
It turned out that a single set of properties fitted better than I
expected, so I presented things that way to see what people thought
about it.
For bitmaps, there isn't a strong need to change the set of properties
already available in the top-level info/ directories.  These can be
adopted into the new info under resource_schemata/, but I might be
tempted to rename them to remove "cbm" string so that the names are
applicable to all bitmap- style resources.  I might also rename the
min_cbm_bits property if we can think of a more intuitive name -- it's
not obvious how this should apply to sparse bitmaps.
Thinking about bit_usage, is that really per-schema?
If L3CODE and L3DATA are really allocating the same underlying
resource, I wonder whether their bit_usage should be combined,
somehow.
This might be one for later, though.
It doesn't look necessary to adopt all existing properties into the
extended schema description immediately -- if there are some that don't
quite fit, we could adopt them later on without breaking backwards
compatibilty.
Do you see a risk, there?
Cheers
---Dave
Powered by blists - more mailing lists
 
