lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aPf0OKwDZ4XbmVRB@agluck-desk3>
Date: Tue, 21 Oct 2025 13:59:36 -0700
From: "Luck, Tony" <tony.luck@...el.com>
To: Dave Martin <Dave.Martin@....com>
CC: Reinette Chatre <reinette.chatre@...el.com>,
	<linux-kernel@...r.kernel.org>, James Morse <james.morse@....com>, "Thomas
 Gleixner" <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, "Borislav
 Petkov" <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter
 Anvin" <hpa@...or.com>, Jonathan Corbet <corbet@....net>, <x86@...nel.org>,
	<linux-doc@...r.kernel.org>
Subject: Re: [PATCH] fs/resctrl,x86/resctrl: Factor mba rounding to be
 per-arch

Hi Dave,

On Tue, Oct 21, 2025 at 03:37:35PM +0100, Dave Martin wrote:
> Hi Tony,
> 
> On Mon, Oct 20, 2025 at 09:31:18AM -0700, Luck, Tony wrote:
> > On Mon, Oct 20, 2025 at 04:50:38PM +0100, Dave Martin wrote:
> > > Hi Reinette,
> > > 
> > > On Fri, Oct 17, 2025 at 08:59:45AM -0700, Reinette Chatre wrote:
> 
> [...]
> 
> > > > By extension I assume that software that understands a schema that is introduced
> > > > after the "relationship" format is established can be expected to understand the
> > > > format and thus these new schemata do not require the '#' prefix. Even if
> > > > a new schema is introduced with a single control it can be followed by a new child
> > > > control without a '#' prefix a couple of kernel releases later. By this point it
> > > > should hopefully be understood by user space that it should not write entries it does
> > > > not understand.
> > > 
> > > Generally, yes.
> > > 
> > > I think that boils down to: "OK, previously you could just tweak bits
> > > of the whole schemata file you read and write the whole thing back,
> > > and the effect would be what you inuitively expected.  But in future
> > > different schemata in the file may not be independent of one another.
> > > We'll warn you which things might not be independent, but we may not
> > > describe exactly how they affect each other.
> > 
> > Changes to the schemata file are currently "staged" and then applied.
> > There's some filesystem level error/sanity checking during the parsing
> > phase, but maybe for MB some parts can also be delayed, and re-ordered
> > when architecture code applies the changes.
> > 
> > E.g. while filesystem code could check min <= opt <= max. Architecture
> > code would be responsible to write the values to h/w in a sane manner
> > (assuming architecture cares about transient effects when things don't
> > conform to the ordering).
> > 
> > E.g. User requests moving from min,opt,max = 10,20,30 to 40,50,60
> > Regardless of the order those requests appeared in the write(2) syscall
> > architecture bumps max to 60, then opt to 50, and finally min to 40.
> 
> This could be sorted indeed be sorted out during staging, but I'm not
> sure that we can/should rely on it.
> 
> If we treat the data coming from a single write() as a transaction, and
> stage the whole thing before executing it, that's fine.  But I think
> this has to be viewed as an optimisation rather than guaranteed
> semantics.
> 
> 
> We told userspace that schemata is an S_IFREG regular file, so we have
> to accept a write() boundary anywhere in the stream.
> 
> (In fact, resctrl chokes if a write boundary occurs in the middle of a
> line.  In practice, stdio buffering and similar means that this issue
> turns out to be difficult to hit, except with shell scripts that try to
> emit a line piecemeal -- I have a partial fix for that knocking around,
> but this throws up other problems, so I gave up for the time being.)

Is this worth the pain and complexity? Maybe just document the reality
of the implementation since day 1 of resctrl that each write(2) must
contain one or more lines, each terminated with "\n".

There are already so many ways that the schemata file does not behave
like a regular S_IFREG file. E.g. accepting a write to just update
one domain in a resource: # echo L3:2=0xff > schemata

So describe schemata in terms of writing "update commands" rather
than "Lines"?

> 
> We also cannot currently rely on userspace closing the fd between
> "transactions".  We never told userspace to do that, previously.  We
> could make a new requirement, but it feels unexpected/unreasonable (?)
> 
> > > 
> > > "So, from now on, only write the things that you actually want to set."
> > > 
> > > Does that sound about right?
> > 
> > Users might still use their favorite editor on the schemata file and
> > so write everything, while only changing a subset. So if we don't go
> > for the full two-phase update I describe above this would be:
> > 
> >   "only *change* the things that you actually want to set".

I misremembered where the check for "did the user change the value"
happened. I thought it was during parsing, but it is actually in
resctrl_arch_update_domains() after all input parsing is complete
and resctrl is applying changes. So unless we change things to work
the way I hallucinated, then ordering does matter the way you
described.
> 
> [...]
> 
> > -Tony
> 
> This works if the schemata file is output in the right order (and the
> user doesn't change the order):
> 
> # cat schemata
> MB:0=100;1=100
> # MB_HW:0=1024;1=1024
> 
> ->
> 
> # cat <<EOF >schemata
> MB:0=100;1=100
> MB_HW:0=512,1=512
> EOF
> 
> ... though it may still be inefficient, if the lines are not staged
> together.  The hardware memory bandwidth controls may get programmed
> twice, here -- though the final result is probably what was intended.
> 
> I'd still prefer that we tell people that they should be doing this:
> # cat <<EOF >schemata
> MB_HW:0=512,1=512
> EOF
> 
> ...if they are really tyring to set MB_HW and don't care about the
> effect on MB?

I'm starting to worry about this co-existence of old/new syntax for
Intel region aware. Life seems simple if there is only one MB_HW
connected to the legacy "MB". Updates to either will make both
appear with new values when the schemata is read. E.g.

# cat schemata
MB:0=100
#MB_HW=255

# echo MB:0=50 > schemata

# cat schemata
MB:0=50
#MB_HW=127

But Intel will have several MB_HW controls, one for each region.
[Schemata names TBD, but I'll just call them 0, 1, 2, 3 here]

# cat schemata
MB:0=100
#MB_HW0=255
#MB_HW1=255
#MB_HW2=255
#MB_HW3=255

If the user sets just one of the HW controls:

# echo MB_HW1=64

what should resctrl display for the legacy "MB:" line?

> 
> Cheers
> ---Dave

-Tony

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ