lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 14 May 2020 11:45:49 -0500
From:   Babu Moger <babu.moger@....com>
To:     Reinette Chatre <reinette.chatre@...el.com>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "fenghua.yu@...el.com" <fenghua.yu@...el.com>,
        "bp@...en8.de" <bp@...en8.de>,
        "tony.luck@...el.com" <tony.luck@...el.com>
Cc:     "kuo-lang.tseng@...el.com" <kuo-lang.tseng@...el.com>,
        "ravi.v.shankar@...el.com" <ravi.v.shankar@...el.com>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "hpa@...or.com" <hpa@...or.com>, "x86@...nel.org" <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH V3 1/4] x86/resctrl: Enable user to view and select thread
 throttling mode

Hi Reinnette,

The patches did not apply on my tree. I got the latest tree today. You
might want to check again.
Hunk #1 FAILED at 29.
1 out of 7 hunks FAILED -- saving rejects to file
arch/x86/kernel/cpu/resctrl/rdtgroup.c.rej


> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@...el.com>
> Sent: Wednesday, May 6, 2020 6:50 PM
> To: tglx@...utronix.de; fenghua.yu@...el.com; bp@...en8.de;
> tony.luck@...el.com
> Cc: kuo-lang.tseng@...el.com; ravi.v.shankar@...el.com; mingo@...hat.com;
> Moger, Babu <Babu.Moger@....com>; hpa@...or.com; x86@...nel.org;
> linux-kernel@...r.kernel.org; Reinette Chatre <reinette.chatre@...el.com>
> Subject: [PATCH V3 1/4] x86/resctrl: Enable user to view and select thread
> throttling mode
> 
> Intel Memory Bandwidth Allocation (MBA) control is provided per
> processor core. At the same time different CLOS, configured with different
> bandwidth percentages, can be assigned to the hardware threads
> sharing a core. In the original implementation of MBA the maximum throttling
> of the per-thread CLOS is allocated to the core. Specifically, the lower
> bandwidth percentage is allocated to the core.
> 
> Newer systems can be configured to allocate either maximum or
> minimum throttling of the per-thread CLOS values to the core.
> 
> Introduce a new resctrl file, "thread_throttle_mode", on Intel systems
> that exposes to the user how per-thread values are allocated to
> a core. On systems that support the original MBA implementation the
> file will always display "max". On systems that can be configured
> the possible values are "min" or "max" that the user can modify by
> writing these same words to the file.
> 
> Signed-off-by: Reinette Chatre <reinette.chatre@...el.com>
> ---
> Changes since V2:
> - Rebase on top of recently merged series "x86/resctrl: Support wider
>   MBM counters". Small change needed to take into account
>   asm/resctrl_sched.h -> asm/resctrl.h name change.
> - Fix rST formatting of documentation (resctrl_ui.rst) describing
>   new "thread_throttle_mode" resctrl file.
> 
>  Documentation/x86/resctrl_ui.rst       |  19 ++-
>  arch/x86/kernel/cpu/resctrl/core.c     |  32 +++++
>  arch/x86/kernel/cpu/resctrl/internal.h |  13 ++
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 188 ++++++++++++++++++++++++-
>  4 files changed, 249 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/x86/resctrl_ui.rst
> b/Documentation/x86/resctrl_ui.rst
> index 5368cedfb530..861ee2816470 100644
> --- a/Documentation/x86/resctrl_ui.rst
> +++ b/Documentation/x86/resctrl_ui.rst
> @@ -138,6 +138,19 @@ with respect to allocation:
>  		non-linear. This field is purely informational
>  		only.
> 
> +"thread_throttle_mode":
> +		Indicator (on some CPU models control) on Intel systems
> +		of how tasks running on threads of a physical core are
> +		throttled in cases where they request different memory
> +		bandwidth percentages:
> +
> +		"min":
> +			the largest percentage is applied
> +			to all threads
> +		"max":
> +			the smallest percentage is applied
> +			to all threads
> +
>  If RDT monitoring is available there will be an "L3_MON" directory
>  with the following files:
> 
> @@ -364,8 +377,10 @@ to the next control step available on the hardware.
> 
>  The bandwidth throttling is a core specific mechanism on some of Intel
>  SKUs. Using a high bandwidth and a low bandwidth setting on two threads
> -sharing a core will result in both threads being throttled to use the
> -low bandwidth. The fact that Memory bandwidth allocation(MBA) is a core
> +sharing a core may result in both threads being throttled to use the
> +low bandwidth (see "thread_throttle_mode").
> +
> +The fact that Memory bandwidth allocation(MBA) may be a core
>  specific mechanism where as memory bandwidth monitoring(MBM) is done at
>  the package level may lead to confusion when users try to apply control
>  via the MBA and then monitor the bandwidth to see if the controls are
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c
> b/arch/x86/kernel/cpu/resctrl/core.c
> index 12f967c6b603..1bc686777069 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -250,6 +250,30 @@ static inline bool rdt_get_mb_table(struct rdt_resource
> *r)
>  	return false;
>  }
> 
> +/*
> + * Model-specific test to determine if platform where memory bandwidth
> + * control is applied to a core can be configured to apply either the
> + * maximum or minimum of the per-thread delay values.
> + * By default, platforms where memory bandwidth control is applied to a
> + * core will select the maximum delay value of the per-thread CLOS.
> + *
> + * NOTE: delay value programmed to hardware is inverse of bandwidth
> + * percentage configured via user interface.
> + */
> +bool mba_cfg_supports_min_max_intel(void)
> +{
> +	switch (boot_cpu_data.x86_model) {
> +	case INTEL_FAM6_ATOM_TREMONT_D:
> +	case INTEL_FAM6_ICELAKE_X:
> +	case INTEL_FAM6_ICELAKE_D:
> +		return true;
> +	default:
> +		return false;
> +	}
> +
> +	return false;
> +}

I see that you are calling this function multiple times. Why don't you
make it as a property in rdt_resource. Set it only once during the
init(may be in get_mem_config_intel). Then you can use it wherever
required. This also probably help James to make everything architecture
independent. What do you think?

I assume that this property is probably not part of CPUID.

> +
>  static bool __get_mem_config_intel(struct rdt_resource *r)
>  {
>  	union cpuid_0x10_3_eax eax;
> @@ -270,6 +294,11 @@ static bool __get_mem_config_intel(struct
> rdt_resource *r)
>  	}
>  	r->data_width = 3;
> 
> +	if (mba_cfg_supports_min_max_intel())
> +		thread_throttle_mode_init_intel_rw();
> +	else
> +		thread_throttle_mode_init_intel_ro();
> +
>  	r->alloc_capable = true;
>  	r->alloc_enabled = true;
> 
> @@ -580,6 +609,9 @@ static void domain_add_cpu(int cpu, struct rdt_resource
> *r)
> 
>  	rdt_domain_reconfigure_cdp(r);
> 
> +	if (mba_cfg_supports_min_max_intel())
> +		wrmsrl(MSR_MBA_CFG, mba_cfg_msr);
> +
>  	if (r->alloc_capable && domain_setup_ctrlval(r, d)) {
>  		kfree(d);
>  		return;
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
> b/arch/x86/kernel/cpu/resctrl/internal.h
> index f20a47d120b1..c5f4cb91009b 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -9,6 +9,7 @@
> 
>  #define MSR_IA32_L3_QOS_CFG		0xc81
>  #define MSR_IA32_L2_QOS_CFG		0xc82
> +#define MSR_MBA_CFG			0xc84
>  #define MSR_IA32_L3_CBM_BASE		0xc90
>  #define MSR_IA32_L2_CBM_BASE		0xd10
>  #define MSR_IA32_MBA_THRTL_BASE		0xd50
> @@ -21,6 +22,9 @@
> 
>  #define L2_QOS_CDP_ENABLE		0x01ULL
> 
> +#define MBA_THROTTLE_MODE_MIN		0x01ULL
> +#define MBA_THROTTLE_MODE_MAX		0x00ULL
> +
>  /*
>   * Event IDs are used to program IA32_QM_EVTSEL before reading event
>   * counter from IA32_QM_CTR
> @@ -38,6 +42,8 @@
>  #define MBA_MAX_MBPS			U32_MAX
>  #define MAX_MBA_BW_AMD			0x800
> 
> +#define MBA_THROTTLE_MODE_MASK		BIT_ULL(0)
> +
>  #define RMID_VAL_ERROR			BIT_ULL(63)
>  #define RMID_VAL_UNAVAIL		BIT_ULL(62)
>  /*
> @@ -47,6 +53,10 @@
>   */
>  #define MBM_CNTR_WIDTH_OFFSET_MAX (62 - MBM_CNTR_WIDTH_BASE)
> 
> +/*
> + * MSR_MBA_CFG cache
> + */
> +extern u64 mba_cfg_msr;
> 
>  struct rdt_fs_context {
>  	struct kernfs_fs_context	kfc;
> @@ -611,5 +621,8 @@ void __check_limbo(struct rdt_domain *d, bool
> force_free);
>  bool cbm_validate_intel(char *buf, u32 *data, struct rdt_resource *r);
>  bool cbm_validate_amd(char *buf, u32 *data, struct rdt_resource *r);
>  void rdt_domain_reconfigure_cdp(struct rdt_resource *r);
> +bool mba_cfg_supports_min_max_intel(void);
> +void thread_throttle_mode_init_intel_rw(void);
> +void thread_throttle_mode_init_intel_ro(void);
> 
>  #endif /* _ASM_X86_RESCTRL_INTERNAL_H */
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index d7cb5ab0d1f0..6a9408060ac4 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -29,6 +29,7 @@
> 
>  #include <uapi/linux/magic.h>
> 
> +#include <asm/intel-family.h>
>  #include <asm/resctrl.h>
>  #include "internal.h"
> 
> @@ -38,6 +39,7 @@ DEFINE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
>  static struct kernfs_root *rdt_root;
>  struct rdtgroup rdtgroup_default;
>  LIST_HEAD(rdt_all_groups);
> +u64 mba_cfg_msr;
> 
>  /* Kernel fs node for "info" directory under root */
>  static struct kernfs_node *kn_info;
> @@ -1017,6 +1019,134 @@ static int max_threshold_occ_show(struct
> kernfs_open_file *of,
>  	return 0;
>  }
> 
> +/*
> + * As documented in the Intel SDM, on systems supporting the original MBA
> + * implementation the delay value allocated to a core is always the maximum
> + * of the delay values assigned to the hardware threads sharing the core.
> + *
> + * Some systems support a model-specific MSR with which this default
> + * behavior can be changed. On these systems the core can be allocated
> + * with either the minimum or maximum delay value assigned to its hardware
> + * threads.
> + *
> + * NOTE: The hardware deals with memory delay values that may be
> programmed
> + * from zero (implying zero delay, and full bandwidth available) to the
> + * maximum specified in CPUID. The software interface deals with memory
> + * bandwidth percentages that are the inverse of the delay values (100%
> + * memory bandwidth from user perspective is zero MBA delay from hardware
> + * perspective). When maximum throttling is active the core is allocated
> + * with the maximum delay value that from the software interface will be
> + * the minimum of the bandwidth percentages assigned to the hardware
> threads
> + * sharing the core.
> + */
> +static int rdt_thread_throttle_mode_show(struct kernfs_open_file *of,
> +					 struct seq_file *seq, void *v)
> +{
> +	unsigned int throttle_mode = 0;
> +
> +	if (mba_cfg_supports_min_max_intel())
> +		throttle_mode = mba_cfg_msr &
> MBA_THROTTLE_MODE_MASK;
> +
> +	seq_puts(seq,
> +		 throttle_mode == MBA_THROTTLE_MODE_MIN ? "min\n" :
> "max\n");
> +
> +	return 0;
> +}
> +
> +static void update_mba_cfg(void *data)
> +{
> +	u64 *mba_cfg = data;
> +
> +	wrmsrl(MSR_MBA_CFG, *mba_cfg);
> +}
> +
> +/*
> + * The model-specific MBA configuration MSR has package scope. Making a
> + * system-wide MBA configuration change thus needs to modify the MSR on
> one
> + * CPU from each package.
> + */
> +static int rdt_system_mba_cfg_set(u64 mba_cfg)
> +{
> +	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA];
> +	cpumask_var_t cpu_mask;
> +	struct rdt_domain *d;
> +
> +	if (list_is_singular(&r->domains)) {
> +		wrmsrl(MSR_MBA_CFG, mba_cfg);
> +		goto out;
> +	}
> +
> +	if (!zalloc_cpumask_var(&cpu_mask, GFP_KERNEL)) {
> +		rdt_last_cmd_puts("Memory allocation error\n");
> +		return -ENOMEM;
> +	}
> +
> +	list_for_each_entry(d, &r->domains, list)
> +		cpumask_set_cpu(cpumask_any(&d->cpu_mask), cpu_mask);
> +
> +	on_each_cpu_mask(cpu_mask, update_mba_cfg, &mba_cfg, 1);
> +
> +	free_cpumask_var(cpu_mask);
> +out:
> +	mba_cfg_msr = mba_cfg;
> +	return 0;
> +}
> +
> +/*
> + * See NOTE associated with rdt_thread_throttle_mode_show() for
> + * details of the min/max interpretation.
> + */
> +static ssize_t rdt_thread_throttle_mode_write(struct kernfs_open_file *of,
> +					      char *buf, size_t nbytes,
> +					      loff_t off)
> +{
> +	u64 mba_cfg;
> +	int ret = 0;
> +
> +	if (nbytes == 0)
> +		return -EINVAL;
> +
> +	cpus_read_lock();
> +	mutex_lock(&rdtgroup_mutex);
> +
> +	rdt_last_cmd_clear();
> +
> +	/*
> +	 * Additional check.
> +	 * This function should not be associated with the user space file
> +	 * on systems that do not support configuration.
> +	 */
> +	if (!mba_cfg_supports_min_max_intel()) {
> +		rdt_last_cmd_puts("Platform does not support mode
> changes\n");
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	mba_cfg = mba_cfg_msr & MBA_THROTTLE_MODE_MASK;
> +
> +	if ((sysfs_streq(buf, "min") && mba_cfg ==
> MBA_THROTTLE_MODE_MIN) ||
> +	    (sysfs_streq(buf, "max") && mba_cfg ==
> MBA_THROTTLE_MODE_MAX))
> +		goto out;
> +
> +	if (sysfs_streq(buf, "min")) {
> +		mba_cfg = MBA_THROTTLE_MODE_MIN;
> +	} else if (sysfs_streq(buf, "max")) {
> +		mba_cfg = MBA_THROTTLE_MODE_MAX;
> +	} else {
> +		rdt_last_cmd_puts("Unknown or unsupported mode\n");
> +		ret = -EINVAL;
> +		goto out;
> +	}
> +
> +	mba_cfg = (mba_cfg_msr & ~MBA_THROTTLE_MODE_MASK) |
> mba_cfg;
> +	ret = rdt_system_mba_cfg_set(mba_cfg);
> +
> +out:
> +	mutex_unlock(&rdtgroup_mutex);
> +	cpus_read_unlock();
> +	return ret ?: nbytes;
> +}
> +
>  static ssize_t max_threshold_occ_write(struct kernfs_open_file *of,
>  				       char *buf, size_t nbytes, loff_t off)
>  {
> @@ -1512,6 +1642,16 @@ static struct rftype res_common_files[] = {
>  		.seq_show	= rdt_delay_linear_show,
>  		.fflags		= RF_CTRL_INFO | RFTYPE_RES_MB,
>  	},
> +	/*
> +	 * Platform specific which (if any) capabilities are provided by
> +	 * thread_throttle_mode. Defer some initialization to platform
> +	 * discovery.
> +	 */
> +	{
> +		.name		= "thread_throttle_mode",
> +		.kf_ops		= &rdtgroup_kf_single_ops,
> +		.seq_show	= rdt_thread_throttle_mode_show,
> +	},
>  	{
>  		.name		= "max_threshold_occupancy",
>  		.mode		= 0644,
> @@ -1571,6 +1711,47 @@ static struct rftype res_common_files[] = {
> 
>  };
> 
> +static struct rftype *rdtgroup_rftype_by_name(const char *name)
> +{
> +	struct rftype *rfts, *rft;
> +	int len;
> +
> +	rfts = res_common_files;
> +	len = ARRAY_SIZE(res_common_files);
> +
> +	for (rft = rfts; rft < rfts + len; rft++) {
> +		if (!strcmp(rft->name, name))
> +			return rft;
> +	}
> +
> +	return NULL;
> +}
> +
> +void __init thread_throttle_mode_init_intel_rw(void)
> +{
> +	struct rftype *rft;
> +
> +	rft = rdtgroup_rftype_by_name("thread_throttle_mode");
> +	if (!rft)
> +		return;
> +
> +	rft->mode = 0644;
> +	rft->write = rdt_thread_throttle_mode_write;
> +	rft->fflags = RF_CTRL_INFO | RFTYPE_RES_MB;
> +}
> +
> +void __init thread_throttle_mode_init_intel_ro(void)
> +{
> +	struct rftype *rft;
> +
> +	rft = rdtgroup_rftype_by_name("thread_throttle_mode");
> +	if (!rft)
> +		return;
> +
> +	rft->mode = 0444;
> +	rft->fflags = RF_CTRL_INFO | RFTYPE_RES_MB;
> +}
> +
>  static int rdtgroup_add_files(struct kernfs_node *kn, unsigned long fflags)
>  {
>  	struct rftype *rfts, *rft;
> @@ -1582,7 +1763,7 @@ static int rdtgroup_add_files(struct kernfs_node *kn,
> unsigned long fflags)
>  	lockdep_assert_held(&rdtgroup_mutex);
> 
>  	for (rft = rfts; rft < rfts + len; rft++) {
> -		if ((fflags & rft->fflags) == rft->fflags) {
> +		if (rft->fflags && ((fflags & rft->fflags) == rft->fflags)) {
>  			ret = rdtgroup_add_file(kn, rft);
>  			if (ret)
>  				goto error;
> @@ -2239,6 +2420,11 @@ static int reset_all_ctrls(struct rdt_resource *r)
>  	smp_call_function_many(cpu_mask, rdt_ctrl_update, &msr_param, 1);
>  	put_cpu();
> 
> +	if (mba_cfg_supports_min_max_intel()) {
> +		mba_cfg_msr = 0;
> +		on_each_cpu_mask(cpu_mask, update_mba_cfg,
> &mba_cfg_msr, 1);
> +	}
> +
>  	free_cpumask_var(cpu_mask);
> 
>  	return 0;
> --
> 2.21.0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ