[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6e8d3645-cb0d-4bfe-a170-6306e3c60582@intel.com>
Date: Thu, 6 Nov 2025 09:45:59 -0800
From: Reinette Chatre <reinette.chatre@...el.com>
To: Dave Martin <Dave.Martin@....com>
CC: <linux-kernel@...r.kernel.org>, Tony Luck <tony.luck@...el.com>, "James
Morse" <james.morse@....com>, "Chen, Yu C" <yu.c.chen@...el.com>, "Thomas
Gleixner" <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, "Borislav
Petkov" <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter
Anvin" <hpa@...or.com>, Jonathan Corbet <corbet@....net>, <x86@...nel.org>,
Drew Fustini <dfustini@...libre.com>
Subject: Re: [RFC] fs/resctrl: Generic schema description
+Drew
On 11/4/25 2:26 PM, Reinette Chatre wrote:
> Hi Dave,
>
> On 10/30/25 9:36 AM, Dave Martin wrote:
>> Hi Reinette,
>>
>> On Tue, Oct 28, 2025 at 04:17:05PM -0700, Reinette Chatre wrote:
>>> Hi Dave,
>>>
>>> On 10/24/25 4:12 AM, Dave Martin wrote:
>>>> Hi all,
>>>>
>>>> Going forward, a single resctrl resource (such as memory bandwidth) is
>>>> likely to require multiple schemata, either because we want to add new
>>>> schemata that provide finer control, or because the hardware has
>>>> multiple controls, covering different aspects of resource allocation.
>>>>
>>>> The fit between MPAM's memory bandwidth controls and the resctrl MB
>>>> schema is already awkward, and later Intel RDT features such as Region
>>>> Aware Memory Bandwidth Allocation are already pushing past what the MB
>>>> schema can describe. Both of these can involve multiple control
>>>> values and finer resolution than the 100 steps offered by the current
>>>> "MB" schema.
>>>>
>>>> The previous discussion went off in a few different directions [1], so
>>>> I want to focus back onto defining an extended schema description that
>>>> aims to cover the use cases that we know about or anticipate today, and
>>>> allows for future extension as needed.
>>>>
>>>> (A separate discussion is needed on how new schemata interact with
>>>> previously-defined schemata (such as the MB percentage schema).
>>>> suggest we pause that discussion for now, in the interests of getting
>>>> the schema description nailed down.)
>>>
>>> ok, but let's keep this as "open #1"
>>>
>>>> Following on from the previous mail thread, I've tried to refine and
>>>> flesh out the proposal for schema descriptions a bit, as follows.
>>>>
>>>> Proposal:
>>>>
>>>> * Split resource names and schema names in resctrlfs.
>>>>
>>>> Resources will be named for the unique, existing schema for each
>>>> resource.
>>>
>>> Are you referring to the implementation or how things are exposed to user
>>> space? I am trying to understand how the existing L3CODE/L3DATA schemata
>>> fit in ... they are presented to user space as two separate resources since
>>> they each have their own directory in "info" while internally they are
>>> schema of the L3 resource.
>>
>> Good question -- I didn't take into account here the fact that some
>> physical resources already have multiple schemata exposed to userspace.
>>
>> I've probably overformalised, here. I'm not proposing to refactor the
>> arrangement of existing schemata and resources.
>>
>> So we would continue to have
>> info/L3CODE/resource_schemata/L3CODE/ and
>> info/L3DATA/resource_schemata/L3DATA/.
>>
>>
>> I think that the decision to combine these under a single resctrl
>> resource internally is the most logical one, but I'm proposing just to
>> extend the info/ content, without unnecssary changes.
>
> Thank you for confirming. This matches the way I was thinking about this work.
>
>>
>> The current arrangement does have one shortcoming, which is that
>> software doesn't know (other than by built-in knowledge) that L3CODE
>> and L3DATA claim resource from the same hardware pool, so
>>
>> L3CODE:0=0001
>> L3DATA:0=0001
>>
>> implies that the transactions on the I-side and D-side contend for
>> cache lines (unless there are separate L3 I- and D-caches -- but I
>> don't think that's a thing on any relevant system...)
>>
>> So, we might want some way to indicate that L3CODE and L3DATA are
>> linked. But I think that CDP is a unique case where we can reasonably
>> expect some built-in userspace knowledge.
>
> I'll admit that it is not as obvious as this new interface would make it be
> for new schemata but userspace is not entirely left to its own devices.
> resctrl will ensure that these resources do not overlap when, for example,
> a resource group is exclusive. For example, an L3CODE allocation in one
> resource group cannot be created to overlap with an L3DATA allocation in
> another when one of the resource groups is exclusive.
>
>>
>> I didn't currently plan to address this, but it could come later if we
>> think it's important.
>>
>>> Just trying to understand if you are talking about reverting
>>> https://lore.kernel.org/all/20210728170637.25610-1-james.morse@arm.com/ ?
>>
>> No...
>>
>>> The current implementation appears to match this proposal so we may need to
>>> have special cases to keep CDP backwards compatible.
>>>
>>> SMBA may also need some extra care ... especially if other architectures start
>>> to allocate memory bandwidth to CXL resource via their "MB" resource.
>>
>> Perhaps. I think it may be necessary to hack up and implementation of
>> these changes, to flush out things that don't quite fit.
>
> Have you considered how MPAM may want to deal with different memory "types"?
> With SMBA there is a "CXL memory" resource while the MB resource has mostly
> been "anything that misses L3". From a user space perspective it is not obvious
> to me how users prefer to refer to different memory types.
>
>>
>>>
>>>> The existing schema will keep its name (the same as the resource
>>>> name), and new schemata defined for a resource will include that
>>>> name as a prefix (at least, by default).
>
> We may have to be explicit on expectations wrt which schema can be observed in
> which area (schemata file vs new info hierarchy). resctrl.rst currently contains:
> "schemata":
> A list of all the resources available to this group.
> With the above in existing documentation resctrl may be forced to always keep
> existing schema/resource in the schemata file and be careful when considering to
> drop them as mused in https://lore.kernel.org/lkml/aPkEb4CkJHZVDt0V@agluck-desk3/
>
> Theoretically it may be possible in the future for it to vary which resources a
> resource group may allocate. Consider for example when resources support different
> numbers of CLOSID/PARTID and there is a desire to expose that to user space instead of
> constraining all resource groups to lowest CLOSID/PARTID. In such a scenario it should
> be clear to user space which resources it can allocate to a resource group so it is
> reasonable to expect the existing documentation for "schemata" being "A list of all
> the resources available to this group." to be respected.
>
> On the flip side, it may not be required that a new schema in new info hierarchy always
> appears in the schemata file. Reason I think this is after seeing in MPAM that
> controls could be enabled/disabled (like MPAMCFG_MBW_PROP.EN for proportional-stride
> partitioning).
>
> resctrl may thus have support for more partitioning controls than what is exposed by
> schemata file with ability for user space to choose which partitioning controls to expose
> in schemata file to use to manage a resource. It may then turn out that in addition to
> (read-only) schema "properties" there may also be (writable) schema "controls" (bad name
> since this would "control" a "partitioning control") where user space can modify behavior
> of a partitioning control.
>
>>>>
>>>> So, for example, we will have an MB resource with a schema called
>>>> MB (the schema that we have already). But we may go on to define
>>>> additional schemata for the MB resource, with names such MB_MAX,
>>>> etc.
>>>>
>>>> * Stop adding new schema description information in the top-level
>>>> info/<resource>/ directory in resctrlfs.
>>>>
>>>> For backwards compatibilty, we can keep the existing property
>>>> files under the resource info directory to describe the previously
>>>> defined resource, but we seem to need something richer going
>>>> forward.
>
> ack.
>
>>>>
>>>> * Add a hierarchy to list all the schemata for each resource, along
>>>> with their properties. So far, the proposal looks like this,
>>>> taking the MB resource as an example:
>>>>
>>>> info/
>>>> └─ MB/
>>>> └─ resource_schemata/
>>>> ├─ MB/
>>>> ├─ MB_MIN/
>>>> ├─ MB_MAX/
>>>> ┆
>>>>
>>>> Here, MB, MB_MIN and MB_MAX are all schemata for the "MB" resource.
>>>> In this proposal, what these just dummy schema names for
>>>> illustration purposes. The important thing is that they all
>>>> control aspects of the "MB" resource, and that there can be more
>>>> than one of them.
>>>>
>>>> It may be appropriate to have a nested hierarchy, where some
>>>> schemata are presented as children of other schemata if they
>>>> affect the same hardware controls. For now, let's put this issue
>>>> on one side, and consider what properties should be advertsed for
>>>> each schema.
>>>
>>> ok to put this aside but I think we should keep including it, "open #2" ?
>>
>> Yes; I'm not abandoning this, but I wanted to focus on the schema
>> description, here.
>
> Understood. There may be some connection with this work if there is a hierarchy
> since one schema's description may then be in terms of another. For example,
> the relationships described via pseudocode in https://lore.kernel.org/lkml/aPJP52jXJvRYAjjV@e133380.arm.com/
>
> As a sidenote (related to the '#' prefix discussion), while trying to understand how
> this work may impact user expectations I did come across this in section
> "Reading/writing the schemata file" of resctrl.rst:
> When writing you only need to specify those values which you wish to change.
>
> This seems quite close to addressing the concern raised in
> https://lore.kernel.org/lkml/aNv53UmFGDBL0z3O@e133380.arm.com/ :
> The reason why I think that this convention may be needed is that we
> never told (old) userspace what it was supposed to do with schemata
> entries that it does not recognise.
>
>>>> * Current properties that I think we might want are:
>>>>
>>>> info/
>>>> └─ SOME_RESOURCE/
>>>> └─ resource_schemata/
>>>> ├─ SOME_SCHEMA/
>>>> ┆ ├─ type
>>>> ├─ min
>>>> ├─ max
>>>> ├─ tolerance
>>>> ├─ resolution
>>>> ├─ scale
>>>> └─ unit
>>>>
>>>> (I've tweaked the properties a bit since previous postings.
>>>> "type" replaces "map"; "scale" is now the unit multiplier;
>>>> "resolution" is now a scaling divisor -- details below.)
>>>>
>>>> I assume that we expose the properties in individual files, but we
>>>> could also combine them into a single description file per schema,
>>>> per resource or (possibly) a single global file.
>>>> (I don't have a strong view on the best option.)
>>>>
>>>>
>>>> Either way, the following set of properties may be a reasonable
>>>> place to start:
>>>>
>>>>
>>>> type: the schema type, followed by optional flag specifiers:
>>>>
>>>> - "scalar": a single-valued numeric control
>>>>
>>>> A mandatory flag indicates how the control value written to
>>>> the schemata file is converted to an amount of resource for
>>>> hardware regulation.
>>>>
>>>> The flag "linear" indicates a linear mapping.
>>>>
>>>> In this case, the amount of resource E that is actually
>>>> allocated is derived from the control value C written to the
>>>> schemata file as follows:
>>>>
>>>> E = C * scale * unit / resolution
>>>>
>>>> Other flags values could be defined later, if we encounter
>>>> hardware with non-linear controls.
>>>>
>>>> - "bitmap": a bitmap control
>>>>
>>>> The optional flag "sparse" is present if the control accepts
>>>> sparse bitmaps.
>>>>
>>>> In this case, E = bitmap_weight(C) * scale * unit / resolution.
>>>>
>>>> As before, each bit controls access to a specific chunk of
>>>> resource in the hardware, such as a group of cache lines. All
>>>> chunks are equally sized.
>>>>
>>>> (Different CTRL_MON groups may still contend within the
>>>> allocation E, when they have bits in common between their
>>>> bitmaps.)
>>>
>>> Would it not be simpler to have the files/properties depend on the
>>> schema type? It almost seems as though some of the properties are forced
>>> to have some meaning for bitmap when they do not seem to be needed. Instead,
>>> for a bitmap type there can be bitmap specific properties like, for example,
>>> bit_usage. This may also create more flexibility when there is a future
>>> mapping function needed that depends on some new property?
>>>
>>> Reinette
>>
>> Sure, there is no reason why the set of properties has to be identical
>> for different schema types.
>>
>> It turned out that a single set of properties fitted better than I
>> expected, so I presented things that way to see what people thought
>> about it.
>>
>> For bitmaps, there isn't a strong need to change the set of properties
>> already available in the top-level info/ directories. These can be
>> adopted into the new info under resource_schemata/, but I might be
>> tempted to rename them to remove "cbm" string so that the names are
>> applicable to all bitmap- style resources. I might also rename the
>> min_cbm_bits property if we can think of a more intuitive name -- it's
>> not obvious how this should apply to sparse bitmaps.
>
> yes, this is a good time to rename things.
>
>>
>>
>> Thinking about bit_usage, is that really per-schema?
>
> Good point. This is per resource.
>
> This may create complexity if multiple controls are available for a resource. For
> example, if there is a MB resource with both a proportional schema and a max then
> it sounds like it may be possible to program the proportional schema with 100% while
> setting the max to 50%. On the hardware side these values may be legal, albeit with
> unpredictable performance, but it will be difficult for resctrl to visualize the
> "bit_usage" of such an allocation.
>
>>
>> If L3CODE and L3DATA are really allocating the same underlying
>> resource, I wonder whether their bit_usage should be combined,
>> somehow.
>
> Related to earlier comment this is done internally by resctrl but not exposed to
> user space. I earlier mentioned how exclusive groups take this into account, there
> is also the bitmasks used when creating new resource groups. You will, for example,
> find in __init_one_rdt_domain() that their bit usage is combined as below:
>
> if (resctrl_arch_get_cdp_enabled(r->rid))
> peer_ctl = resctrl_arch_get_config(r, d, i, peer_type);
> else
> peer_ctl = 0;
> ctrl_val = resctrl_arch_get_config(r, d, i, s->conf_type);
> used_b |= ctrl_val | peer_ctl;
>
>>
>> This might be one for later, though.
>>
>> It doesn't look necessary to adopt all existing properties into the
>> extended schema description immediately -- if there are some that don't
>> quite fit, we could adopt them later on without breaking backwards
>> compatibilty.
>
> It is not obvious to me that it will be simple to add a property to an
> existing schema type. We may be forced to create new schema type when needing to
> do so.
>
> I also think there may be more schema types that will eventually need to be
> supported, for example MPAM's priority partitioning?
>
> Reinette
Powered by blists - more mailing lists