[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aNqEwhUZd+prWdfK@e133380.arm.com>
Date: Mon, 29 Sep 2025 14:08:18 +0100
From: Dave Martin <Dave.Martin@....com>
To: Reinette Chatre <reinette.chatre@...el.com>,
"Luck, Tony" <tony.luck@...el.com>
Cc: linux-kernel@...r.kernel.org, James Morse <james.morse@....com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"H. Peter Anvin" <hpa@...or.com>, Jonathan Corbet <corbet@....net>,
x86@...nel.org, linux-doc@...r.kernel.org
Subject: Re: [PATCH] fs/resctrl,x86/resctrl: Factor mba rounding to be
per-arch
Hi Reinette, Tony,
On Thu, Sep 25, 2025 at 03:18:51PM -0700, Reinette Chatre wrote:
> Hi Tony,
>
> On 9/25/25 2:35 PM, Luck, Tony wrote:
[...]
> > Director Technology (IntelĀ® RDT) Architecture Specification"
> >
> > https://cdrdv2.intel.com/v1/dl/getContent/789566
> >
> > describes the upcoming region aware memory bandwidth allocation
> > controls as being a number from "1" to "Q" (enumerated in an ACPI
> > table). First implementation looks like Q == 255 which means a
> > granularity of 0.392% The spec has headroom to allow Q == 511.
That does look like it would benefit from exposing the hardware field
without rounding (similarly as for MPAM).
Is the relationship between this value and the expected memory system
throughput actually defined anywhere?
If the expected throughput is exactly proportional to this value, or a
reasonable approximation to this, then that it simple -- but I can't
see it actually stated.
when a spec suggests a need to divide by (2^N - 1), I do wonder whether
that it what they _really_ meant (and whether hardware will just do the
obvious cheap approximation in defiance of the spec).
> >
> > I don't expect users to need that granularity at the high bandwidth
> > end of the range, but I do expect them to care for highly throttled
> > background/batch jobs to make sure they can't affect performance of
> > the high priority jobs.
A case where it _might_ matter is where there is a non-trivial number
of jobs, and an attempt is made to share bandwidth among them.
Although it may not matter exactly how much bandwidth is given to each
job, the rounding errors may accumulate so that they add up to
significantly more than or less than 100% in total. This feels
undesirable.
Rounding off the value in the interface effectively makes it impossible
for portable software to avoid this problem...
> > I'd hate to have to round all low bandwidth controls to 1% steps.
+1! (No pun intended.)
> This is the limitation if choosing to expose this feature as an MB resource
> and seems to be the same problem that Dave is facing. For finer granularity
> allocations I expect that we would need a new schema/resource backed by new
> properties as proposed by Dave in
> https://lore.kernel.org/lkml/aNFliMZTTUiXyZzd@e133380.arm.com/
> This will require updates to user space (that will anyway be needed if wedging
> another non-ABI input into MB).
>
> Reinette
Ack; while we could add decimal places to bandwidth_gran as reported to
userspace, we don't know that software isn't going to choke on that.
Plus, we could need to add precision to the control values too --
it's no good advertising 0.5% guanularity when the MB schema only
accepts/reports integers.
Software that parses anything as (potentially) a real number might work
transparently, but we didn't warn users that they might need to do
that...
Cheers
---Dave
Powered by blists - more mailing lists