lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45b96e99-ac5e-4546-b58b-f4d062d2f823@intel.com>
Date: Thu, 20 Nov 2025 08:46:24 -0800
From: Reinette Chatre <reinette.chatre@...el.com>
To: Dave Martin <Dave.Martin@....com>, <linux-kernel@...r.kernel.org>
CC: Tony Luck <tony.luck@...el.com>, James Morse <james.morse@....com>, "Ben
 Horgan" <ben.horgan@....com>, Thomas Gleixner <tglx@...utronix.de>, "Ingo
 Molnar" <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, Dave Hansen
	<dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>, "Jonathan
 Corbet" <corbet@....net>, <x86@...nel.org>, <linux-doc@...r.kernel.org>
Subject: Re: [PATCH v2] x86,fs/resctrl: Factor MBA parse-time conversion to be
 per-arch

Hi Dave,

On 10/31/25 8:41 AM, Dave Martin wrote:
> The control value parser for the MB resource currently coerces the
> memory bandwidth percentage value from userspace to be an exact
> multiple of the rdt_resource::resctrl_membw::bw_gran parameter.
> 
> On MPAM systems, this results in somewhat worse-than-worst-case
> rounding, since the bandwidth granularity advertised to resctrl by the
> MPAM driver is in general only an approximation to the actual hardware
> granularity on these systems, and the hardware bandwidth allocation
> control value is not natively a percentage -- necessitating a further
> conversion in the resctrl_arch_update_domains() path, regardless of the
> conversion done at parse time.
> 
> Allow the arch to provide its own parse-time conversion that is
> appropriate for the hardware, and move the existing conversion to x86.
> This will avoid accumulated error from rounding the value twice on MPAM
> systems.
> 
> Clarify the documentation, but avoid overly exact promises.
> 
> Clamping to bw_min and bw_max still feels generic: leave it in the core
> code, for now.
> 
> No functional change.

Please use max line length available. Changelogs have to conform before merge
and having patch ready saves this work.

> 
> Signed-off-by: Dave Martin <Dave.Martin@....com>
> 
> ---

...

> diff --git a/Documentation/filesystems/resctrl.rst b/Documentation/filesystems/resctrl.rst
> index b7f35b07876a..fbbcf5421346 100644
> --- a/Documentation/filesystems/resctrl.rst
> +++ b/Documentation/filesystems/resctrl.rst
> @@ -144,12 +144,11 @@ with respect to allocation:
>  		user can request.
>  
>  "bandwidth_gran":
> -		The granularity in which the memory bandwidth
> -		percentage is allocated. The allocated
> -		b/w percentage is rounded off to the next
> -		control step available on the hardware. The
> -		available bandwidth control steps are:
> -		min_bandwidth + N * bandwidth_gran.
> +		The approximate granularity in which the memory bandwidth
> +		percentage is allocated. The allocated bandwidth percentage
> +		is rounded up to the next control step available on the
> +		hardware. The available hardware steps are no larger than
> +		this value.
>  
>  "delay_linear":
>  		Indicates if the delay scale is linear or
> @@ -737,8 +736,10 @@ The minimum bandwidth percentage value for each cpu model is predefined
>  and can be looked up through "info/MB/min_bandwidth". The bandwidth
>  granularity that is allocated is also dependent on the cpu model and can
>  be looked up at "info/MB/bandwidth_gran". The available bandwidth
> -control steps are: min_bw + N * bw_gran. Intermediate values are rounded
> -to the next control step available on the hardware.
> +control steps are, approximately, min_bw + N * bw_gran.  The steps may
> +appear irregular due to rounding to an exact percentage: bw_gran is the
> +maximum interval between the percentage values corresponding to any two
> +adjacent steps in the hardware.
>  
>  The bandwidth throttling is a core specific mechanism on some of Intel
>  SKUs. Using a high bandwidth and a low bandwidth setting on two threads

The documentation changes look good to me. Thank you.

> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> index 1189c0df4ad7..dc05a50e80fb 100644
> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> @@ -16,9 +16,15 @@
>  #define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt
>  
>  #include <linux/cpu.h>
> +#include <linux/math.h>
>  
>  #include "internal.h"
>  
> +u32 resctrl_arch_preconvert_bw(u32 val, const struct rdt_resource *r)
> +{
> +	return roundup(val, (unsigned long)r->membw.bw_gran);
> +}
> +
>  int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_ctrl_domain *d,
>  			    u32 closid, enum resctrl_conf_type t, u32 cfg_val)
>  {
> diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
> index 0d0ef54fc4de..26e3f7c88c86 100644
> --- a/fs/resctrl/ctrlmondata.c
> +++ b/fs/resctrl/ctrlmondata.c
> @@ -35,8 +35,8 @@ typedef int (ctrlval_parser_t)(struct rdt_parse_data *data,
>  /*
>   * Check whether MBA bandwidth percentage value is correct. The value is
>   * checked against the minimum and max bandwidth values specified by the
> - * hardware. The allocated bandwidth percentage is rounded to the next
> - * control step available on the hardware.
> + * hardware. The allocated bandwidth percentage is converted as
> + * appropriate for consumption by the specific hardware driver.

The text looks good but adjusting the right margin mid-paragraph seems unnecessary?

>   */
>  static bool bw_validate(char *buf, u32 *data, struct rdt_resource *r)
>  {
> @@ -69,7 +69,7 @@ static bool bw_validate(char *buf, u32 *data, struct rdt_resource *r)
>  		return false;
>  	}
>  
> -	*data = roundup(bw, (unsigned long)r->membw.bw_gran);
> +	*data = resctrl_arch_preconvert_bw(bw, r);
>  	return true;
>  }
>  
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index a7d92718b653..1fb6e2118b61 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -485,6 +485,20 @@ bool resctrl_arch_mbm_cntr_assign_enabled(struct rdt_resource *r);
>   */
>  int resctrl_arch_mbm_cntr_assign_set(struct rdt_resource *r, bool enable);
>  
> +/*
> + * Convert a bandwidth control value to the appropriate form for
> + * consumption by the hardware driver for resource r.

When being as descriptive, please switch to proper kernel-doc instead. There are
a couple of examples for reference in this header file.

> + *
> + * For example, it simplifies the x86 RDT implementation to round the
> + * value to a suitable step here and then treat the resulting value as
> + * opaque when programming the hardware MSRs later on.

This "For example" can be dropped.

> + *
> + * Architectures for which this pre-conversion hook is not useful
> + * should supply an implementation of this function that just returns
> + * val unmodified.
> + */
> +u32 resctrl_arch_preconvert_bw(u32 val, const struct rdt_resource *r);
> +
>  /*
>   * Update the ctrl_val and apply this config right now.
>   * Must be called on one of the domain's CPUs.
> 
> base-commit: dcb6fa37fd7bc9c3d2b066329b0d27dedf8becaa

Reinette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ