linux-kernel - Re: [RFC] fs/resctrl: Generic schema description

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <08078c25-87fc-43e6-b062-f9945edcee80@intel.com>
Date: Tue, 4 Nov 2025 14:26:16 -0800
From: Reinette Chatre <reinette.chatre@...el.com>
To: Dave Martin <Dave.Martin@....com>
CC: <linux-kernel@...r.kernel.org>, Tony Luck <tony.luck@...el.com>, "James
 Morse" <james.morse@....com>, "Chen, Yu C" <yu.c.chen@...el.com>, "Thomas
 Gleixner" <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, "Borislav
 Petkov" <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter
 Anvin" <hpa@...or.com>, Jonathan Corbet <corbet@....net>, <x86@...nel.org>
Subject: Re: [RFC] fs/resctrl: Generic schema description

Hi Dave,

On 10/30/25 9:36 AM, Dave Martin wrote:
> Hi Reinette,
> 
> On Tue, Oct 28, 2025 at 04:17:05PM -0700, Reinette Chatre wrote:
>> Hi Dave,
>>
>> On 10/24/25 4:12 AM, Dave Martin wrote:
>>> Hi all,
>>>
>>> Going forward, a single resctrl resource (such as memory bandwidth) is
>>> likely to require multiple schemata, either because we want to add new
>>> schemata that provide finer control, or because the hardware has
>>> multiple controls, covering different aspects of resource allocation.
>>>
>>> The fit between MPAM's memory bandwidth controls and the resctrl MB
>>> schema is already awkward, and later Intel RDT features such as Region
>>> Aware Memory Bandwidth Allocation are already pushing past what the MB
>>> schema can describe.  Both of these can involve multiple control
>>> values and finer resolution than the 100 steps offered by the current
>>> "MB" schema.
>>>
>>> The previous discussion went off in a few different directions [1], so
>>> I want to focus back onto defining an extended schema description that
>>> aims to cover the use cases that we know about or anticipate today, and
>>> allows for future extension as needed.
>>>
>>> (A separate discussion is needed on how new schemata interact with
>>> previously-defined schemata (such as the MB percentage schema). 
>>> suggest we pause that discussion for now, in the interests of getting
>>> the schema description nailed down.)
>>
>> ok, but let's keep this as "open #1"
>>
>>> Following on from the previous mail thread, I've tried to refine and
>>> flesh out the proposal for schema descriptions a bit, as follows.
>>>
>>> Proposal:
>>>
>>>   * Split resource names and schema names in resctrlfs.
>>>
>>>     Resources will be named for the unique, existing schema for each
>>>     resource.
>>
>> Are you referring to the implementation or how things are exposed to user
>> space? I am trying to understand how the existing L3CODE/L3DATA schemata
>> fit in ... they are presented to user space as two separate resources since
>> they each have their own directory in "info" while internally they are 
>> schema of the L3 resource.
> 
> Good question -- I didn't take into account here the fact that some
> physical resources already have multiple schemata exposed to userspace.
> 
> I've probably overformalised, here.  I'm not proposing to refactor the
> arrangement of existing schemata and resources.	
> 
> So we would continue to have
> info/L3CODE/resource_schemata/L3CODE/ and
> info/L3DATA/resource_schemata/L3DATA/.
> 
> 
> I think that the decision to combine these under a single resctrl
> resource internally is the most logical one, but I'm proposing just to
> extend the info/ content, without unnecssary changes.

Thank you for confirming. This matches the way I was thinking about this work.

> 
> The current arrangement does have one shortcoming, which is that
> software doesn't know (other than by built-in knowledge) that L3CODE
> and L3DATA claim resource from the same hardware pool, so
> 
> 	L3CODE:0=0001
> 	L3DATA:0=0001
> 
> implies that the transactions on the I-side and D-side contend for
> cache lines (unless there are separate L3 I- and D-caches -- but I
> don't think that's a thing on any relevant system...)
> 
> So, we might want some way to indicate that L3CODE and L3DATA are
> linked.  But I think that CDP is a unique case where we can reasonably
> expect some built-in userspace knowledge.

I'll admit that it is not as obvious as this new interface would make it be
for new schemata but userspace is not entirely left to its own devices. 
resctrl will ensure that these resources do not overlap when, for example,
a resource group is exclusive. For example, an L3CODE allocation in one
resource group cannot be created to overlap with an L3DATA allocation in
another when one of the resource groups is exclusive.

> 
> I didn't currently plan to address this, but it could come later if we
> think it's important.
> 
>> Just trying to understand if you are talking about reverting
>> https://lore.kernel.org/all/20210728170637.25610-1-james.morse@arm.com/ ?
> 
> No...
> 
>> The current implementation appears to match this proposal so we may need to
>> have special cases to keep CDP backwards compatible.
>>
>> SMBA may also need some extra care ... especially if other architectures start
>> to allocate memory bandwidth to CXL resource via their "MB" resource.
> 
> Perhaps.  I think it may be necessary to hack up and implementation of
> these changes, to flush out things that don't quite fit.

Have you considered how MPAM may want to deal with different memory "types"?
With SMBA there is a "CXL memory" resource while the MB resource has mostly
been "anything that misses L3". From a user space perspective it is not obvious
to me how users prefer to refer to different memory types.

> 
>>  
>>>     The existing schema will keep its name (the same as the resource
>>>     name), and new schemata defined for a resource will include that
>>>     name as a prefix (at least, by default).

We may have to be explicit on expectations wrt which schema can be observed in
which area (schemata file vs new info hierarchy). resctrl.rst currently contains:
	"schemata":
		A list of all the resources available to this group.
With the above in existing documentation resctrl may be forced to always keep
existing schema/resource in the schemata file and be careful when considering to
drop them as mused in https://lore.kernel.org/lkml/aPkEb4CkJHZVDt0V@agluck-desk3/

Theoretically it may be possible in the future for it to vary which resources a
resource group may allocate. Consider for example when resources support different
numbers of CLOSID/PARTID and there is a desire to expose that to user space instead of
constraining all resource groups to lowest CLOSID/PARTID. In such a scenario it should
be clear to user space which resources it can allocate to a resource group so it is
reasonable to expect the existing documentation for "schemata" being "A list of all
the resources available to this group." to be respected.

On the flip side, it may not be required that a new schema in new info hierarchy always
appears in the schemata file. Reason I think this is after seeing in MPAM that
controls could be enabled/disabled (like MPAMCFG_MBW_PROP.EN for proportional-stride
partitioning).

resctrl may thus have support for more partitioning controls than what is exposed by
schemata file with ability for user space to choose which partitioning controls to expose
in schemata file to use to manage a resource. It may then turn out that in addition to
(read-only) schema "properties" there may also be (writable) schema "controls" (bad name
since this would "control" a "partitioning control") where user space can modify behavior
of a partitioning control.

>>>
>>>     So, for example, we will have an MB resource with a schema called
>>>     MB (the schema that we have already).  But we may go on to define
>>>     additional schemata for the MB resource, with names such MB_MAX,
>>>     etc.
>>>
>>>   * Stop adding new schema description information in the top-level
>>>     info/<resource>/ directory in resctrlfs.
>>>
>>>     For backwards compatibilty, we can keep the existing property
>>>     files under the resource info directory to describe the previously
>>>     defined resource, but we seem to need something richer going
>>>     forward.

ack.

>>>
>>>   * Add a hierarchy to list all the schemata for each resource, along
>>>     with their properties.  So far, the proposal looks like this,
>>>     taking the MB resource as an example:
>>>
>>> 	info/
>>> 	 └─ MB/
>>> 	     └─ resource_schemata/
>>> 	         ├─ MB/
>>> 	         ├─ MB_MIN/
>>> 	         ├─ MB_MAX/
>>> 	         ┆
>>>
>>>     Here, MB, MB_MIN and MB_MAX are all schemata for the "MB" resource.
>>>     In this proposal, what these just dummy schema names for
>>>     illustration purposes.  The important thing is that they all
>>>     control aspects of the "MB" resource, and that there can be more
>>>     than one of them.
>>>
>>>     It may be appropriate to have a nested hierarchy, where some
>>>     schemata are presented as children of other schemata if they
>>>     affect the same hardware controls.  For now, let's put this issue
>>>     on one side, and consider what properties should be advertsed for
>>>     each schema.
>>
>> ok to put this aside but I think we should keep including it, "open #2" ?
> 
> Yes; I'm not abandoning this, but I wanted to focus on the schema
> description, here.

Understood. There may be some connection with this work if there is a hierarchy
since one schema's description may then be in terms of another. For example,
the relationships described via pseudocode in https://lore.kernel.org/lkml/aPJP52jXJvRYAjjV@e133380.arm.com/

As a sidenote (related to the '#' prefix discussion), while trying to understand how
this work may impact user expectations I did come across this in section
"Reading/writing the schemata file" of resctrl.rst:
	When writing you only need to specify those values which you wish to change.

This seems quite close to addressing the concern raised in
https://lore.kernel.org/lkml/aNv53UmFGDBL0z3O@e133380.arm.com/ :
	The reason why I think that this convention may be needed is that we
	never told (old) userspace what it was supposed to do with schemata 
	entries that it does not recognise.
 
>>>   * Current properties that I think we might want are:
>>>
>>> 	info/
>>> 	 └─ SOME_RESOURCE/
>>> 	     └─ resource_schemata/
>>> 	         ├─ SOME_SCHEMA/
>>> 	         ┆   ├─ type
>>> 	             ├─ min
>>> 	             ├─ max
>>> 	             ├─ tolerance
>>> 	             ├─ resolution
>>> 	             ├─ scale
>>> 	             └─ unit
>>>
>>>     (I've tweaked the properties a bit since previous postings.
>>>     "type" replaces "map"; "scale" is now the unit multiplier;
>>>     "resolution" is now a scaling divisor -- details below.)
>>>
>>>     I assume that we expose the properties in individual files, but we
>>>     could also combine them into a single description file per schema,
>>>     per resource or (possibly) a single global file.
>>>     (I don't have a strong view on the best option.)
>>>
>>>
>>>     Either way, the following set of properties may be a reasonable
>>>     place to start:
>>>
>>>
>>>     type: the schema type, followed by optional flag specifiers:
>>>
>>>       - "scalar": a single-valued numeric control
>>>
>>>         A mandatory flag indicates how the control value written to
>>>         the schemata file is converted to an amount of resource for
>>>         hardware regulation.
>>>
>>> 	The flag "linear" indicates a linear mapping.
>>>
>>> 	In this case, the amount of resource E that is actually
>>> 	allocated is derived from the control value C written to the
>>> 	schemata file as follows:
>>>
>>>     	E = C * scale * unit / resolution
>>>
>>> 	Other flags values could be defined later, if we encounter
>>> 	hardware with non-linear controls.
>>>
>>>       - "bitmap": a bitmap control
>>>
>>>         The optional flag "sparse" is present if the control accepts
>>>         sparse bitmaps.
>>>
>>> 	In this case, E = bitmap_weight(C) * scale * unit / resolution.
>>>
>>> 	As before, each bit controls access to a specific chunk of
>>> 	resource in the hardware, such as a group of cache lines.  All
>>> 	chunks are equally sized.
>>>
>>> 	(Different CTRL_MON groups may still contend within the
>>> 	allocation E, when they have bits in common between their
>>> 	bitmaps.)
>>
>> Would it not be simpler to have the files/properties depend on the
>> schema type? It almost seems as though some of the properties are forced
>> to have some meaning for bitmap when they do not seem to be needed. Instead,
>> for a bitmap type there can be bitmap specific properties like, for example,
>> bit_usage. This may also create more flexibility when there is a future
>> mapping function needed that depends on some new property?
>>
>> Reinette
> 
> Sure, there is no reason why the set of properties has to be identical
> for different schema types.
> 
> It turned out that a single set of properties fitted better than I
> expected, so I presented things that way to see what people thought
> about it.
> 
> For bitmaps, there isn't a strong need to change the set of properties
> already available in the top-level info/ directories.  These can be
> adopted into the new info under resource_schemata/, but I might be
> tempted to rename them to remove "cbm" string so that the names are
> applicable to all bitmap- style resources.  I might also rename the
> min_cbm_bits property if we can think of a more intuitive name -- it's
> not obvious how this should apply to sparse bitmaps.

yes, this is a good time to rename things.

> 
> 
> Thinking about bit_usage, is that really per-schema?

Good point. This is per resource.

This may create complexity if multiple controls are available for a resource. For
example, if there is a MB resource with both a proportional schema and a max then
it sounds like it may be possible to program the proportional schema with 100% while
setting the max to 50%. On the hardware side these values may be legal, albeit with
unpredictable performance, but it will be difficult for resctrl to visualize the
"bit_usage" of such an allocation.

> 
> If L3CODE and L3DATA are really allocating the same underlying
> resource, I wonder whether their bit_usage should be combined,
> somehow.

Related to earlier comment this is done internally by resctrl but not exposed to
user space. I earlier mentioned how exclusive groups take this into account, there
is also the bitmasks used when creating new resource groups. You will, for example,
find in __init_one_rdt_domain() that their bit usage is combined as below:

		if (resctrl_arch_get_cdp_enabled(r->rid))               
			peer_ctl = resctrl_arch_get_config(r, d, i, peer_type);  
		else                                                    
			peer_ctl = 0;                                   
		ctrl_val = resctrl_arch_get_config(r, d, i, s->conf_type);       
		used_b |= ctrl_val | peer_ctl;                     

> 
> This might be one for later, though.
> 
> It doesn't look necessary to adopt all existing properties into the
> extended schema description immediately -- if there are some that don't
> quite fit, we could adopt them later on without breaking backwards
> compatibilty.

It is not obvious to me that it will be simple to add a property to an
existing schema type. We may be forced to create new schema type when needing to
do so.

I also think there may be more schema types that will eventually need to be
supported, for example MPAM's priority partitioning?

Reinette