[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aB4woCcnPC5Mz7cf@e133380.arm.com>
Date: Fri, 9 May 2025 17:43:12 +0100
From: Dave Martin <Dave.Martin@....com>
To: Peter Newman <peternewman@...gle.com>
Cc: Reinette Chatre <reinette.chatre@...el.com>,
"Luck, Tony" <tony.luck@...el.com>,
Fenghua Yu <fenghuay@...dia.com>,
Maciej Wieczor-Retman <maciej.wieczor-retman@...el.com>,
James Morse <james.morse@....com>, Babu Moger <babu.moger@....com>,
Drew Fustini <dfustini@...libre.com>,
Anil Keshavamurthy <anil.s.keshavamurthy@...el.com>,
Chen Yu <yu.c.chen@...el.com>, x86@...nel.org,
linux-kernel@...r.kernel.org, patches@...ts.linux.dev
Subject: Re: [PATCH v4 13/31] fs/resctrl: Add support for additional monitor
event display formats
Hi,
On Fri, May 09, 2025 at 04:46:30PM +0200, Peter Newman wrote:
> Hi Dave,
>
> On Fri, May 9, 2025 at 1:29 PM Dave Martin <Dave.Martin@....com> wrote:
[...]
> > For example: scaling memory bandwidth percentages for MPAM is a
> > nuisance because the hardware uses fixed-point values scaled by a power
> > of 2, not by 100: the two scales can never match up anywhere except at
> > multiples of 25%, leading to irregular increments when rounded to an
> > integer percentage value and uncertainty about what the bandwidth_gran
> > parameter means. Round-trip conversions between the two
> > representations become error-prone due to repeated rounding -- this
> > proved quite fiddly to get right. Precision beyond 1% increments may
> > also be available in the hardware, but is not accessible through the
> > resctrl interface.
>
> Google users got annoyed with these rounding errors very quickly and
> asked me to change the MBA interface to the raw, fixed-point value
> used by the MPAM register interface. (but at least shifted down, since
> the MBW_MIN/MAX fields are left-justified)
That's interesting.
Do you find a need to do things like step the bandwidth allocation for
a control group? So, as part of a tuning regime, the bandwidth value
is read out, stepped to the next distinct hardware value and written
back in?
That kind of thing does not map in a convenient way onto the current
interface, although fire-and-forget programming of a predetermined
percentage works fine.
Extending my model outline, a 6-bit MPAM MBW_PART implementation might
be described by:
min: 1
max: 64
step size: 1
multiplier: 1
divisor: 64
How easy / difficult do you think it would be for userspace to work
with this, if resctrlfs were to expose the raw control (minus the
ignored bits) with that metadata?
Needless to say, the max and divisor values would dependent on the
hardware and possibly other factors. They would be fixed for the
lifetime of a single resctrl instance at the very least.
> > For backwards compatibility we probably shouldn't change that
> > particular interface, but if we can avoid new instances of the same
> > kind of problem then that would be a benefit: i.e., explicitly tell
> > userspace how to scale a given parameter.
>
> MBA is not programmed by percentage on AMD, so I'm not sure why this
> is considered necessary for backwards compatibility.
I presumed scripts (or pre-tuned data fed through them) are in practice
pretty platform-specific, so that it will upset people if the interface
changes between kernel versions at least on a given hardware family.
The divergence between AMD and Intel in this area is unfortunate, but
absolute and proportional bandwidth measures do not really seem to be
interchangeable -- so a truly unified interface may not be easy to
achieve either.
Having two control names in the interface might work, say:
MBP: proportion of total available memory bandwidth (%)
MBA: absolute memory bandwidth (B/s)
Then just expose the one that the hardware implements natively (while
still exposing MB as a backwards compatible alias if necessary).
Cheers
---Dave
Powered by blists - more mailing lists