linux-kernel - Re: [PATCH] fs/resctrl,x86/resctrl: Factor mba rounding to be per-arch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aNv/BxibVkXHkxam@e133380.arm.com>
Date: Tue, 30 Sep 2025 17:02:15 +0100
From: Dave Martin <Dave.Martin@....com>
To: "Luck, Tony" <tony.luck@...el.com>
Cc: linux-kernel@...r.kernel.org,
	Reinette Chatre <reinette.chatre@...el.com>,
	James Morse <james.morse@....com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	"H. Peter Anvin" <hpa@...or.com>, Jonathan Corbet <corbet@....net>,
	x86@...nel.org, linux-doc@...r.kernel.org
Subject: Re: [PATCH] fs/resctrl,x86/resctrl: Factor mba rounding to be
 per-arch

Hi Tony,

On Mon, Sep 29, 2025 at 09:37:41AM -0700, Luck, Tony wrote:
> On Mon, Sep 29, 2025 at 02:56:19PM +0100, Dave Martin wrote:
> > Hi Tony,
> > 
> > Thanks for taking at look at this -- comments below.
> > 
> > [...]
> > 
> > On Thu, Sep 25, 2025 at 03:58:35PM -0700, Luck, Tony wrote:
> > > On Mon, Sep 22, 2025 at 04:04:40PM +0100, Dave Martin wrote:
> > 
> > [...]
> > 
> > > > Would something like the following work?  A read from schemata might
> > > > produce something like this:
> > > > 
> > > > MB: 0=50, 1=50
> > > > # MB_HW: 0=32, 1=32
> > > > # MB_MIN: 0=31, 1=31
> > > > # MB_MAX: 0=32, 1=32
> > 
> > [...]
> > 
> > > > I'd be interested in people's thoughts on it, though.
> > > 
> > > Applying this to Intel upcoming region aware memory bandwidth
> > > that supports 255 steps and h/w min/max limits.
> > 
> > Following the MPAM example, would you also expect:
> > 
> > 	scale: 255
> > 	unit: 100pc
> > 
> > ...?
> 
> Yes. 255 (or whatever "Q" value is provided in the ACPI table)
> corresponds to no throttling, so 100% bandwidth.
> 
> > 
> > > We would have info files with "min = 1, max = 255" and a schemata
> > > file that looks like this to legacy apps:

[...]

> > > MB: 0=4;1=75

[...]

> > > with the legacy line updated with the rounded value of the MB_HW
> > > supplied by the user. 10/255 = 3.921% ... so call it "4".
> > 
> > I'm suggesting that this always be rounded up, so that you have a
> > guarantee that the steps are no smaller than the reported value.
> 
> Round up, rather than round-to-nearest, make sense. Though perhaps
> only cosmetic as I would be surprised if anyone has a mix of tools
> looking at the legacy schemata lines while programming using the
> direct h/w controls.

Ack

[...]

> > Do you have concerns about existing software choking on the #-prefixed
> > lines?
> 
> Do they even need a # prefix? We already mix lines for multiple
> resources in the schemata file with a separate prefix for each resource.
> The schemata file also allows writes to just update one resource (or
> one domain in a single resource). The schemata file started with just
> "L3". Then we added "L2", "MB", and "SMBA" with no concern that the
> initial "L3" manipulating tools would be confused.

The "#" thing is for backwards compatibility with old userspace that
might blindly "paste back" unknown entries when writing the schemata
file.

(See also my reply to Reinette [1].)

> > > Then use some other tag naming for other regions. Remote DDR,
> > > local CXL, remote CXL are the ones we think are next in the h/w
> > > memory sequence. But the "region" concept would allow for other
> > > options as other memory technologies come into use.
> > 
> > Would it be reasnable just to have a set of these schema instances, per
> > region, so:
> > 
> > MB_HW: ... // implicitly region 0
> > MB_HW_1: ...
> > MB_HW_2: ...
> 
> Chen Yu is currently looking at putting the word "TIER" into the
> name, since there's some precedent for describing memory in "tiers".
> 
> Whatever naming scheme is used, the important part is how will users
> find out what each schemata line actually means/controls.

Agreed.  That's a problem, but a separate one.

[...]

> > Or, did you have something else in mind?
> > 
> > My thinking is that we avoid adding complexity in the schemata file if
> > we treat mapping these schema instances onto the hardware topology as
> > an orthogonal problem.  So long as we have unique names in the schemata
> > file, we can describe elsewhere what they relate to in the hardware.
> 
> Yes, exactly this.

OK, that's reassuring.

Cheers
---Dave

[1] https://lore.kernel.org/lkml/aNv53UmFGDBL0z3O@e133380.arm.com/