[<prev] [next>] [day] [month] [year] [list]
Message-ID: <aPtfMFfLV1l/RB0L@e133380.arm.com>
Date: Fri, 24 Oct 2025 12:12:48 +0100
From: Dave Martin <Dave.Martin@....com>
To: linux-kernel@...r.kernel.org
Cc: Tony Luck <tony.luck@...el.com>,
	Reinette Chatre <reinette.chatre@...el.com>,
	James Morse <james.morse@....com>,
	"Chen, Yu C" <yu.c.chen@...el.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	"H. Peter Anvin" <hpa@...or.com>, Jonathan Corbet <corbet@....net>,
	x86@...nel.org
Subject: [RFC] fs/resctrl: Generic schema description
Hi all,
Going forward, a single resctrl resource (such as memory bandwidth) is
likely to require multiple schemata, either because we want to add new
schemata that provide finer control, or because the hardware has
multiple controls, covering different aspects of resource allocation.
The fit between MPAM's memory bandwidth controls and the resctrl MB
schema is already awkward, and later Intel RDT features such as Region
Aware Memory Bandwidth Allocation are already pushing past what the MB
schema can describe.  Both of these can involve multiple control
values and finer resolution than the 100 steps offered by the current
"MB" schema.
The previous discussion went off in a few different directions [1], so
I want to focus back onto defining an extended schema description that
aims to cover the use cases that we know about or anticipate today, and
allows for future extension as needed.
(A separate discussion is needed on how new schemata interact with
previously-defined schemata (such as the MB percentage schema). 
suggest we pause that discussion for now, in the interests of getting
the schema description nailed down.)
Following on from the previous mail thread, I've tried to refine and
flesh out the proposal for schema descriptions a bit, as follows.
Proposal:
  * Split resource names and schema names in resctrlfs.
    Resources will be named for the unique, existing schema for each
    resource.
    The existing schema will keep its name (the same as the resource
    name), and new schemata defined for a resource will include that
    name as a prefix (at least, by default).
    So, for example, we will have an MB resource with a schema called
    MB (the schema that we have already).  But we may go on to define
    additional schemata for the MB resource, with names such MB_MAX,
    etc.
  * Stop adding new schema description information in the top-level
    info/<resource>/ directory in resctrlfs.
    For backwards compatibilty, we can keep the existing property
    files under the resource info directory to describe the previously
    defined resource, but we seem to need something richer going
    forward.
  * Add a hierarchy to list all the schemata for each resource, along
    with their properties.  So far, the proposal looks like this,
    taking the MB resource as an example:
	info/
	 └─ MB/
	     └─ resource_schemata/
	         ├─ MB/
	         ├─ MB_MIN/
	         ├─ MB_MAX/
	         ┆
    Here, MB, MB_MIN and MB_MAX are all schemata for the "MB" resource.
    In this proposal, what these just dummy schema names for
    illustration purposes.  The important thing is that they all
    control aspects of the "MB" resource, and that there can be more
    than one of them.
    It may be appropriate to have a nested hierarchy, where some
    schemata are presented as children of other schemata if they
    affect the same hardware controls.  For now, let's put this issue
    on one side, and consider what properties should be advertsed for
    each schema.
  * Current properties that I think we might want are:
	info/
	 └─ SOME_RESOURCE/
	     └─ resource_schemata/
	         ├─ SOME_SCHEMA/
	         ┆   ├─ type
	             ├─ min
	             ├─ max
	             ├─ tolerance
	             ├─ resolution
	             ├─ scale
	             └─ unit
    (I've tweaked the properties a bit since previous postings.
    "type" replaces "map"; "scale" is now the unit multiplier;
    "resolution" is now a scaling divisor -- details below.)
    I assume that we expose the properties in individual files, but we
    could also combine them into a single description file per schema,
    per resource or (possibly) a single global file.
    (I don't have a strong view on the best option.)
    Either way, the following set of properties may be a reasonable
    place to start:
    type: the schema type, followed by optional flag specifiers:
      - "scalar": a single-valued numeric control
        A mandatory flag indicates how the control value written to
        the schemata file is converted to an amount of resource for
        hardware regulation.
	The flag "linear" indicates a linear mapping.
	In this case, the amount of resource E that is actually
	allocated is derived from the control value C written to the
	schemata file as follows:
    	E = C * scale * unit / resolution
	Other flags values could be defined later, if we encounter
	hardware with non-linear controls.
      - "bitmap": a bitmap control
        The optional flag "sparse" is present if the control accepts
        sparse bitmaps.
	In this case, E = bitmap_weight(C) * scale * unit / resolution.
	As before, each bit controls access to a specific chunk of
	resource in the hardware, such as a group of cache lines.  All
	chunks are equally sized.
	(Different CTRL_MON groups may still contend within the
	allocation E, when they have bits in common between their
	bitmaps.)
    min:
      - For a scalar schema, the minimum value that can be written to
        the control when writing the schemata file.
      - For a bitmap schema, a bitmap of the minimum weight that the
        schema accepts: if an empty bitmap is accepted, this can be 0.
        Otherwise, if bitmaps with a single bit set are acceptable,
        this can just have the lowest-order bit set.
	Most commonly, the value will probably be "1".
	For bitmap schemata, we might report this in hex.  In the
	interest of generic parsing, we could include a "0x" prefix if
	so.
    max:
      - For a scalar schema, the maximum value that can be written to
        the control when writing the schemata file.
      - For a bitmap schema, the mask with all bits set.
        Possibly reported in hex for bitmap schemata (as for "min").
    tolerance:
        (See below for discussion on this.)
      - "0": the control is exact
      
      - "1": the effective control value is within ±1 of the control
        value written to the schemata file.  (Similary, positive "n" ->
        ±n.)
        A negative value could be used to indicate that the tolerance
        is unknown.  (Possibly we could also just omit the property,
        though it seems better to warn userspace explicitly if we
        don't know.)
	Tests might make use of this parameter in order to determine
	how picky to be about exact measurement results.
    resolution:
      - For a proportional scalar schema: the number of divisions that
        the whole resource is divided into.  (See below for
        "proportional scalar schema.)
	Typically, this will be the same as the "max" value.
      - For an absolute scalar schema: the divisor applied to the
        control value.
      - For a bitmap schema: the size of the bitmap in bits.
    scale:
      - For a scalar schema: the scale-up multiplier applied to
        "unit".
      - For a bitmap schema: probably "1".
    unit:
      - The base unit of the quantity measured by the control value.
        The special unit "all" denotes a proportional schema.  In this
        case, the resource is a finite, physical thing such as a cache
        or maxed-out data throughput of a memory controller.  The
        entire physical resource is available for allocation, and the
        control value indicates what proportion of it is allocated.
	Bitmap schemata will probably all be proportional and use the
	unit "all".  (This applies to cache bitmaps, at least.)
	Absolute schemata will require specification of the base unit
	here, say, "MBps".  The "scale" parameter can be used to avoid
	proliferation of unit strings:
	For example, {scale=1000, unit="MBps"} would be equivalent to
	{scale=1, unit="GBps"}.
Note on the "tolerance" parameter:
This is a new addition.  On the MPAM side, the hardware has a choice
about how to interpret the control value in some edge-case situations.
We may not reasonably be able to probe for this, so it may be useful
to warn software that there is an uncertainty margin.
We might also be able to use the "tolerance" parameter to accommodate
the rounding behaviour of the existing "MB" schema (otherwise, we
might want a special "type" for this schema, if it doesn't comply
closely enough).
If we want to deploy resctrl under virtualisation, resctrl on the host
could dynamically affect the actual amount of resource that is
available for allocation inside a VM.
Whether or not we ever want to do that, it might be useful to have a
way to warn software that the effective control values hitting the
hardware may not be entirely predictable.
Thoughts?
Cheers
---Dave
[1] Re: [PATCH] fs/resctrl,x86/resctrl: Factor mba rounding to be per-arch
https://lore.kernel.org/lkml/aNFliMZTTUiXyZzd@e133380.arm.com/
Powered by blists - more mailing lists
 
