[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0131dd9a-327c-4f3f-a9ca-a5126c464e79@intel.com>
Date: Thu, 20 Nov 2025 14:01:36 -0800
From: Reinette Chatre <reinette.chatre@...el.com>
To: Dave Martin <Dave.Martin@....com>
CC: <linux-kernel@...r.kernel.org>, Tony Luck <tony.luck@...el.com>, "James
Morse" <james.morse@....com>, Ben Horgan <ben.horgan@....com>, Thomas
Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Borislav
Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter
Anvin" <hpa@...or.com>, Jonathan Corbet <corbet@....net>, <x86@...nel.org>,
<linux-doc@...r.kernel.org>
Subject: Re: [PATCH v2] x86,fs/resctrl: Factor MBA parse-time conversion to be
per-arch
Hi Dave,
On 11/20/25 9:42 AM, Dave Martin wrote:
> On Thu, Nov 20, 2025 at 08:46:24AM -0800, Reinette Chatre wrote:
>> Hi Dave,
>>
>> On 10/31/25 8:41 AM, Dave Martin wrote:
>>> The control value parser for the MB resource currently coerces the
>>> memory bandwidth percentage value from userspace to be an exact
>>> multiple of the rdt_resource::resctrl_membw::bw_gran parameter.
>>>
>>> On MPAM systems, this results in somewhat worse-than-worst-case
>>> rounding, since the bandwidth granularity advertised to resctrl by the
>>> MPAM driver is in general only an approximation to the actual hardware
>>> granularity on these systems, and the hardware bandwidth allocation
>>> control value is not natively a percentage -- necessitating a further
>>> conversion in the resctrl_arch_update_domains() path, regardless of the
>>> conversion done at parse time.
>>>
>>> Allow the arch to provide its own parse-time conversion that is
>>> appropriate for the hardware, and move the existing conversion to x86.
>>> This will avoid accumulated error from rounding the value twice on MPAM
>>> systems.
>>>
>>> Clarify the documentation, but avoid overly exact promises.
>>>
>>> Clamping to bw_min and bw_max still feels generic: leave it in the core
>>> code, for now.
>>>
>>> No functional change.
>>
>> Please use max line length available. Changelogs have to conform before merge
>> and having patch ready saves this work.
>
> Fixed. Sorry, old habits die hard.
>
> I won't bother with the tearoff text, though, it that's OK -- that
> won't go into git.
ok with me. I am not familiar with precedent in this regard to predict x86 maintainers'
preference.
>
> [...]
>
>>> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
>>> index b7f35b07876a..fbbcf5421346 100644
>>> --- a/Documentation/filesystems/resctrl.rst
>>> +++ b/Documentation/filesystems/resctrl.rst
>
> [...]
>
>>> @@ -737,8 +736,10 @@ The minimum bandwidth percentage value for each cpu model is predefined
>>> and can be looked up through "info/MB/min_bandwidth". The bandwidth
>>> granularity that is allocated is also dependent on the cpu model and can
>>> be looked up at "info/MB/bandwidth_gran". The available bandwidth
>>> -control steps are: min_bw + N * bw_gran. Intermediate values are rounded
>>> -to the next control step available on the hardware.
>>> +control steps are, approximately, min_bw + N * bw_gran. The steps may
>>> +appear irregular due to rounding to an exact percentage: bw_gran is the
>>> +maximum interval between the percentage values corresponding to any two
>>> +adjacent steps in the hardware.
>>>
>>> The bandwidth throttling is a core specific mechanism on some of Intel
>>> SKUs. Using a high bandwidth and a low bandwidth setting on two threads
>>
>> The documentation changes look good to me. Thank you.
>
> OK, sounds good.
>
> [...]
>
>>> diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
>>> index 0d0ef54fc4de..26e3f7c88c86 100644
>>> --- a/fs/resctrl/ctrlmondata.c
>>> +++ b/fs/resctrl/ctrlmondata.c
>>> @@ -35,8 +35,8 @@ typedef int (ctrlval_parser_t)(struct rdt_parse_data *data,
>>> /*
>>> * Check whether MBA bandwidth percentage value is correct. The value is
>>> * checked against the minimum and max bandwidth values specified by the
>>> - * hardware. The allocated bandwidth percentage is rounded to the next
>>> - * control step available on the hardware.
>>> + * hardware. The allocated bandwidth percentage is converted as
>>> + * appropriate for consumption by the specific hardware driver.
>>
>> The text looks good but adjusting the right margin mid-paragraph seems unnecessary?
>>
>
> I was trying to follow the prevailing line length; but I guess the
> lines existing lines were already shortened by flowing the text, so I
> misjudged it.
>
> Fixed locally.
>
> Is there a view on how to flow new text? (not so applicable here)
>
> Keeping the lines a bit short (say, 72 chars) means that minor edits
> and typo fixes can be applied without extraneous diff noise most of the
> time. I find reviewing an entire re-flowed paragraph annoying when
> there is really just a one-word change buried in the middle of it
> somewhere.
>
> Equally, I try to avoid reflowing text for minor edits unless
> absolutely necessary; poking a few chars over 80 seems tolerable in
> that situation, but I prefer it not to go too far...
80 is not a strict rule. Latest comment from Boris in this regard can be
found in link below (made after he went through and reformatted most changelogs
in that resctrl series):
https://lore.kernel.org/lkml/20250916105447.GCaMlB976WLxHHeNMD@fat_crate.local/
>
>>> */
>>> static bool bw_validate(char *buf, u32 *data, struct rdt_resource *r)
>>> {
>>> @@ -69,7 +69,7 @@ static bool bw_validate(char *buf, u32 *data, struct rdt_resource *r)
>>> return false;
>>> }
>>>
>>> - *data = roundup(bw, (unsigned long)r->membw.bw_gran);
>>> + *data = resctrl_arch_preconvert_bw(bw, r);
>>> return true;
>>> }
>>>
>>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>>> index a7d92718b653..1fb6e2118b61 100644
>>> --- a/include/linux/resctrl.h
>>> +++ b/include/linux/resctrl.h
>>> @@ -485,6 +485,20 @@ bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r);
>>> */
>>> int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable);
>>>
>>> +/*
>>> + * Convert a bandwidth control value to the appropriate form for
>>> + * consumption by the hardware driver for resource r.
>>
>> When being as descriptive, please switch to proper kernel-doc instead. There are
>> a couple of examples for reference in this header file.
>
> This comment was pretty trivial to begin with, but grew.
>
>>> + *
>>> + * For example, it simplifies the x86 RDT implementation to round the
>>> + * value to a suitable step here and then treat the resulting value as
>>> + * opaque when programming the hardware MSRs later on.
>>
>> This "For example" can be dropped.
>
> Done.
>
>>> + *
>>> + * Architectures for which this pre-conversion hook is not useful
>>> + * should supply an implementation of this function that just returns
>>> + * val unmodified.
>>> + */
>
> I've fleshed this out a little, as follows:
>
> --8<--
>
> /**
> * resctrl_arch_preconvert_bw() - Convert a bandwidth control value to the
> * appropriate form for consumption by the
> * hardware driver for resource r.
This should be a short description. For example: "Prepare bandwidth value for arch use"
Longer description follows the argument descriptions.
> * @val: Control value written to the schemata file by userspace.
This can be more specific with: "Bandwidth control value ..."
> * @r: Resource whose schema was written.
This is where longer description is expected. For more details you can refer to
Documentation/doc-guide/kernel-doc.rst
> *
> * Return: The converted value.
> *
> * It simplifies the x86 RDT implementation to round the value to a suitable
> * step at parse time and then treat the resulting value as opaque when
> * programming the hardware MSRs later on. In this situation, this hook is the
> * correct place to perform the conversion.
My apologies for not being specific. I meant that the entire x86 specific "For example"
paragraph can be dropped. Unless there are some quirks the architecture should watch out for
(which should be avoided) the focus should just be on what is expected from
architecture and/or insight to architecture how resctrl interacts with the call.
For example (using some of your text from other thread),
Convert the user provided bandwidth control value to an appropriate form for
consumption by the hardware driver for resource @r. Converted value is stored in
rdt_ctrl_domain::staged_config[] for later consumption by resctrl_arch_update_domains().
Is not called when MBA software controller is enabled.
> *
> * Architectures for which this pre-conversion hook is not useful should supply
> * an implementation of this function that just returns @val unmodified.
> */
> u32 resctrl_arch_preconvert_bw(u32 val, const struct rdt_resource *r);
>
> -->8--
>
Reinette
Powered by blists - more mailing lists