linux-kernel - Re: [PATCH v4 13/31] fs/resctrl: Add support for additional monitor event display formats

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALPaoCjzrGMTEYmTpH=9o_=N24apE0U057p6Mt6Knt9PoyFmzw@mail.gmail.com>
Date: Fri, 9 May 2025 16:46:30 +0200
From: Peter Newman <peternewman@...gle.com>
To: Dave Martin <Dave.Martin@....com>
Cc: Reinette Chatre <reinette.chatre@...el.com>, "Luck, Tony" <tony.luck@...el.com>, 
	Fenghua Yu <fenghuay@...dia.com>, 
	Maciej Wieczor-Retman <maciej.wieczor-retman@...el.com>, James Morse <james.morse@....com>, 
	Babu Moger <babu.moger@....com>, Drew Fustini <dfustini@...libre.com>, 
	Anil Keshavamurthy <anil.s.keshavamurthy@...el.com>, Chen Yu <yu.c.chen@...el.com>, x86@...nel.org, 
	linux-kernel@...r.kernel.org, patches@...ts.linux.dev
Subject: Re: [PATCH v4 13/31] fs/resctrl: Add support for additional monitor
 event display formats

Hi Dave,

On Fri, May 9, 2025 at 1:29 PM Dave Martin <Dave.Martin@....com> wrote:
>
> Hi,
>
> (Backtrace retained for context -- see my comment at the end.)
>
> Cheers
> ---Dave
>
> [...]
>
> On Thu, May 08, 2025 at 04:45:21PM -0700, Reinette Chatre wrote:
> > Hi Tony,
> >
> > On 5/8/25 1:28 PM, Luck, Tony wrote:
> > > On Thu, May 08, 2025 at 08:49:56AM -0700, Reinette Chatre wrote:
> > >> On 4/28/25 5:33 PM, Tony Luck wrote:
> > >>> Resctrl was written with the assumption that all monitor events
> > >>> can be displayed as unsigned decimal integers.
> > >>>
> > >>> Some telemetry events provide greater precision where architecture code
> > >>> uses a fixed point format with 18 binary places.
> > >>>
> > >>> Add a "display_format" field to struct mon_evt which can specify
> > >>> that the value for the event be displayed as an integer for legacy
> > >>> events, or as a floating point value with six decimal places converted
> > >>> from the fixed point format received from architecture code.
> > >>
> > >> There was no discussion on this during the previous version.
> > >> While this version addresses the issue of architecture changing the
> > >> format it does not address the issue of how to handle different
> > >> architecture formats. With this change any architecture that may
> > >> want to support any of these events will be required to translate
> > >> whatever format it uses into the one Intel uses to be translated
> > >> again into format for user space. Do you think this is reasonable?
> > >>
> > >> Alternatively, resctrl could add additional file that contains the
> > >> format so that if an architecture in the future needs to present data
> > >> differently, an interface will exist to guide userspace how to parse it.
> > >> Creation of such user interface cannot be delayed until the time
> > >> it is needed since then these formats would be ABI.
> > >
> > > What if resctrl filesystem allows architecture to supply the number
> > > of binary places for fixed point values when enabling an event?
> >
> > This sounds good. I do not think we are in a position to come up with
> > an ideal solution. That would require assumptions of what another
> > architecture may or may not do and thus we do not have complete information.
> >
> > >
> > > That would allow h/w implementations to pick an appropriate precision
> > > for each new event. Different implementations of the same event
> > > (e.g. "core_energy") may pick different precision across architectures
> > > or between generations of the same architecture.
> > >
> > > File system code can then do:
> > >
> > >     if (binary_places == 0)
> > >             display as integer
> > >     else
> > >             convert to floating point (with one decimal place per
> > >             three binary places)
> >
> > I do not think this problem needs to be solved in this work but there needs
> > to be a plan for how other architectures can be supported. When similar
> > enabling needs to be done for that hypothetical architecture then it can
> > be implemented ... if it is still valid based on what that architecture actually
> > supports.
> > It may be sufficient for the "plan" (as above) to be in comments.
> >
> > >
> > > Existing events are all integers and won't change (it would be weird
> > > for an architecture to report "mbm_local_bytes" with a fixed point
> > > rather than integer value).
> > >
> > > New events may report in either integer or floating point format
> > > with varying amounts of precision. But I'm not sure that would be
> >
> > Partly this will depend on the unit of measurement that should form part of
> > the definition of the event. For example, events reporting cycles or ticks
> > should only be integer, no?
> >
> > > a burden for writing tools that can run on different architectures.
> >
> > Maybe just a comment in the docs then ... and now I see that you did
> > so already. My apologies, I did not get to the last four patches.
> >
> > Reinette
>
> Just a thought, but I think that while it's not possible to be fully
> generic, a parameter model along the lines of
>
>         quantity = raw_value * ((double)multiplier / divisor) * BASE_UNIT
>
> would cover most things that we have or can reasonably foresee,
> including memory bandwidth control values.
> raw_value, multiplier and divisor would all be integers.
>
> Since raw_integer can be the value used by the hardware, its precision
> can probably be fixed at 1, though we could still report it explicitly.
>
> Fundamental base units would be things like "byte", "bytes per second"
> and "none" (i.e., dimensionless quantities).  (Are there others?)
>
>
> Since we cannot guess for certain what userspace wants to do with the
> values, it feels better to let userspace do any scaling calculations
> itself, rather than trying to prettify the interface.
>
> For example: scaling memory bandwidth percentages for MPAM is a
> nuisance because the hardware uses fixed-point values scaled by a power
> of 2, not by 100: the two scales can never match up anywhere except at
> multiples of 25%, leading to irregular increments when rounded to an
> integer percentage value and uncertainty about what the bandwidth_gran
> parameter means.  Round-trip conversions between the two
> representations become error-prone due to repeated rounding -- this
> proved quite fiddly to get right.  Precision beyond 1% increments may
> also be available in the hardware, but is not accessible through the
> resctrl interface.

Google users got annoyed with these rounding errors very quickly and
asked me to change the MBA interface to the raw, fixed-point value
used by the MPAM register interface. (but at least shifted down, since
the MBW_MIN/MAX fields are left-justified)

>
> For backwards compatibility we probably shouldn't change that
> particular interface, but if we can avoid new instances of the same
> kind of problem then that would be a benefit: i.e., explicitly tell
> userspace how to scale a given parameter.

MBA is not programmed by percentage on AMD, so I'm not sure why this
is considered necessary for backwards compatibility.

-Peter