linux-kernel - Re: [PATCH] fs/resctrl,x86/resctrl: Factor mba rounding to be per-arch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aNqQAy8nOkLRYx4F@e133380.arm.com>
Date: Mon, 29 Sep 2025 14:56:19 +0100
From: Dave Martin <Dave.Martin@....com>
To: "Luck, Tony" <tony.luck@...el.com>
Cc: linux-kernel@...r.kernel.org,
	Reinette Chatre <reinette.chatre@...el.com>,
	James Morse <james.morse@....com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	"H. Peter Anvin" <hpa@...or.com>, Jonathan Corbet <corbet@....net>,
	x86@...nel.org, linux-doc@...r.kernel.org
Subject: Re: [PATCH] fs/resctrl,x86/resctrl: Factor mba rounding to be
 per-arch

Hi Tony,

Thanks for taking at look at this -- comments below.

[...]

On Thu, Sep 25, 2025 at 03:58:35PM -0700, Luck, Tony wrote:
> On Mon, Sep 22, 2025 at 04:04:40PM +0100, Dave Martin wrote:

[...]

> > Would something like the following work?  A read from schemata might
> > produce something like this:
> > 
> > MB: 0=50, 1=50
> > # MB_HW: 0=32, 1=32
> > # MB_MIN: 0=31, 1=31
> > # MB_MAX: 0=32, 1=32

[...]

> > I'd be interested in people's thoughts on it, though.
> 
> Applying this to Intel upcoming region aware memory bandwidth
> that supports 255 steps and h/w min/max limits.

Following the MPAM example, would you also expect:

	scale: 255
	unit: 100pc

...?

> We would have info files with "min = 1, max = 255" and a schemata
> file that looks like this to legacy apps:
> 
> MB: 0=50;1=75
> #MB_HW: 0=128;1=191
> #MB_MIN: 0=128;1=191
> #MB_MAX: 0=128;1=191
> 
> But a newer app that is aware of the extensions can write:
> 
> # cat > schemata << 'EOF'
> MB_HW: 0=10
> MB_MIN: 0=10
> MB_MAX: 0=64
> EOF
> 
> which then reads back as:
> MB: 0=4;1=75
> #MB_HW: 0=10;1=191
> #MB_MIN: 0=10;1=191
> #MB_MAX: 0=64;1=191
> 
> with the legacy line updated with the rounded value of the MB_HW
> supplied by the user. 10/255 = 3.921% ... so call it "4".

I'm suggesting that this always be rounded up, so that you have a
guarantee that the steps are no smaller than the reported value.

(In this case, round-up and round-to-nearest give the same answer
anyway, though!)

> 
> The region aware h/w supports separate bandwidth controls for each
> region. We could hope (or perhaps update the spec to define) that
> region0 is always node-local DDR memory and keep the "MB" tag for
> that.

Do you have concerns about existing software choking on the #-prefixed
lines?

> Then use some other tag naming for other regions. Remote DDR,
> local CXL, remote CXL are the ones we think are next in the h/w
> memory sequence. But the "region" concept would allow for other
> options as other memory technologies come into use.

Would it be reasnable just to have a set of these schema instances, per
region, so:

MB_HW: ... // implicitly region 0
MB_HW_1: ...
MB_HW_2: ...

etc.

Or, did you have something else in mind?

My thinking is that we avoid adding complexity in the schemata file if
we treat mapping these schema instances onto the hardware topology as
an orthogonal problem.  So long as we have unique names in the schemata
file, we can describe elsewhere what they relate to in the hardware.

Cheers
---Dave