lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <143ff530-ed4c-47c2-abb7-80ddf2adc0de@intel.com>
Date: Fri, 21 Mar 2025 15:58:05 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Babu Moger <babu.moger@....com>, <tglx@...utronix.de>, <mingo@...hat.com>,
	<bp@...en8.de>, <dave.hansen@...ux.intel.com>
CC: <x86@...nel.org>, <hpa@...or.com>, <akpm@...ux-foundation.org>,
	<paulmck@...nel.org>, <thuth@...hat.com>, <rostedt@...dmis.org>,
	<xiongwei.song@...driver.com>, <pawan.kumar.gupta@...ux.intel.com>,
	<jpoimboe@...nel.org>, <daniel.sneddon@...ux.intel.com>,
	<thomas.lendacky@....com>, <perry.yuan@....com>, <sandipan.das@....com>,
	<kai.huang@...el.com>, <seanjc@...gle.com>, <xin3.li@...el.com>,
	<ebiggers@...gle.com>, <andrew.cooper3@...rix.com>,
	<mario.limonciello@....com>, <tan.shaopeng@...itsu.com>,
	<james.morse@....com>, <tony.luck@...el.com>, <peternewman@...gle.com>,
	<linux-doc@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	<eranian@...gle.com>, <corbet@....net>
Subject: Re: [PATCH v3 5/7] x86/resctrl: Add interface to enable/disable
 io_alloc feature

Hi Babu,

On 1/30/25 1:20 PM, Babu Moger wrote:
> The io_alloc feature in resctrl is a mechanism that enables direct
> insertion of data from I/O devices into the L3 cache.
> 
> On AMD systems, io_alloc feature is backed by SDCIAE (L3 Smart Data Cache
> Injection Allocation Enforcement). When enabled, SDCIAE forces all SDCI
> lines to be placed into the L3 cache partitions identified by the
> highest-supported L3_MASK_n register as reported by CPUID
> Fn0000_0010_EDX_x1.MAX_COS. For example, if MAX_COS=15, SDCI lines will
> be allocated into the L3 cache partitions determined by the bitmask in
> the L3_MASK_15 register.
> 
> When CDP is enabled, io_alloc routes I/O traffic using the highest CLOSID
> allocated for the instruction cache.

You can append a "L3CODE" to the above to help provide context on what resource
is referred to as "instruction cache".

> 
> Introduce interface to enable/disable "io_alloc" feature on user input.
> 
> Signed-off-by: Babu Moger <babu.moger@....com>
> ---
> v3: Rewrote the change to make it generic.
>     Rewrote the documentation in resctrl.rst to be generic and added
>     AMD feature details in the end.
>     Added the check to verify if MAX CLOSID availability on the system.
>     Added CDP check to make sure io_alloc is configured in CDP_CODE.
>     Added resctrl_io_alloc_closid_free() to free the io_alloc CLOSID.
>     Added errors in few cases when CLOSID allocation fails.
>     Fixes splat reported when info/L3/bit_usage is accesed when io_alloc
>     is enabled.
>     https://lore.kernel.org/lkml/SJ1PR11MB60837B532254E7B23BC27E84FC052@SJ1PR11MB6083.namprd11.prod.outlook.com/
> 
> v2: Renamed the feature to "io_alloc".
>     Added generic texts for the feature in commit log and resctrl.rst doc.
>     Added resctrl_io_alloc_init_cat() to initialize io_alloc to default
>     values when enabled.
>     Fixed io_alloc show functinality to display only on L3 resource.
> ---
>  Documentation/arch/x86/resctrl.rst     |  34 ++++++
>  arch/x86/kernel/cpu/resctrl/core.c     |   2 +
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 144 +++++++++++++++++++++++++
>  3 files changed, 180 insertions(+)
> 
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index 6768fc1fad16..1b67e31d626c 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -135,6 +135,40 @@ related to allocation:
>  			"1":
>  			      Non-contiguous 1s value in CBM is supported.
>  
> +"io_alloc":
> +		The "io_alloc" feature in resctrl enables system software to

Since this is already resctrl documentation "feature in resctrl" could be dropped to
be just:
		"io_alloc" enables system software ...
	

> +		configure the portion of the L3 cache allocated for I/O traffic.
> +		By directly caching data from I/O devices rather than first storing
> +		the I/O data in DRAM, reduces the demands on DRAM bandwidth and
> +		reduces latency to the processor consuming the I/O data.

hmmm ... looks like "SDCIAE" was deleted from earlier used (marketing?) text and
resulting text left as-is without re-checking if resulting text is still coherent.
I do not think it is needed to motivate/market the feature here, perhaps last
sentence can just be dropped?

> +
> +		The feature routes the I/O traffic via specific CLOSID reserved
> +		for io_alloc feature. By configuring the CBM (Capacity Bit Mask)
> +		for the CLOSID users can control the L3 portions available for
> +		I/O traffic. When enabled, CLOSID reserved for the io_alloc will
> +		not be available to the resctrl group.

Although the above reflects how SDCIAE is implemented it may not be true for how
another architecture may support this. hmmm ... this sounds familiar and looking back it
is the same thing I mentioned in V2 feedback, actually, in V2 I pointed to V1 feedback
that said this also.
If you insist on this text then please change the tone that indicates the
behavior is optional. For example, "An architecture may support io_alloc by reserving
a CLOSID to configure the ..."

> +		::
> +
> +		  # cat /sys/fs/resctrl/info/L3/io_alloc
> +		  0

Please refer to cover-letter about proposal to use enabled/disabled/not supported instead.

> +
> +		  "0":
> +		      io_alloc feature is not enabled.
> +		  "1":
> +		      io_alloc feature is enabled, allowing users to manage
> +		      the portions of the L3 cache allocated for the I/O device.
> +
> +		Feature can be enabled/disabled by writing to the interface.
> +		Example::
> +
> +			# echo 1 > /sys/fs/resctrl/info/L3/io_alloc
> +
> +		On AMD systems, the io_alloc feature is supported by the L3 Smart
> +		Data Cache Injection Allocation Enforcement (SDCIAE). The CLOSID for
> +		io_alloc is determined by the highest CLOSID supported by the resource.
> +		When CDP is enabled, io_alloc routes I/O traffic using the highest
> +		CLOSID allocated for the instruction cache.
> +
>  Memory bandwidth(MB) subdirectory contains the following files
>  with respect to allocation:
>  
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 1ebdb2dcc009..88bc95c14ea8 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -309,6 +309,8 @@ static void rdt_get_cdp_config(int level)
>  static void rdt_set_io_alloc_capable(struct rdt_resource *r)
>  {
>  	r->cache.io_alloc_capable = true;
> +	resctrl_file_fflags_init("io_alloc",
> +				 RFTYPE_CTRL_INFO | RFTYPE_RES_CACHE);
>  }

Some MPAM changes landed since you created this work. After the fs/arch split the
architecture code should have no insight into the resctrl file flags. Please refer to
the MPAM changes on how this can be managed. You can refer to thread_throttle_mode_init()
and similar to that resctrl can use the io_alloc_capable flag to make the files visible.

>  
>  static void rdt_get_cdp_l3_config(void)
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index c5a0a31c3a85..37295dd14abe 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -62,6 +62,7 @@ static char last_cmd_status_buf[512];
>  
>  static int rdtgroup_setup_root(struct rdt_fs_context *ctx);
>  static void rdtgroup_destroy_root(void);
> +static int rdtgroup_init_cat(struct resctrl_schema *s, u32 closid);
>  
>  struct dentry *debugfs_resctrl;
>  
> @@ -180,6 +181,19 @@ void closid_free(int closid)
>  	__set_bit(closid, &closid_free_map);
>  }
>  
> +static int resctrl_io_alloc_closid_alloc(u32 io_alloc_closid)
> +{
> +	if (__test_and_clear_bit(io_alloc_closid, &closid_free_map))
> +		return io_alloc_closid;
> +	else
> +		return -ENOSPC;
> +}
> +
> +static void resctrl_io_alloc_closid_free(u32 io_alloc_closid)
> +{
> +	closid_free(io_alloc_closid);
> +}
> +
>  /**
>   * closid_allocated - test if provided closid is in use
>   * @closid: closid to be tested
> @@ -995,6 +1009,33 @@ static int rdt_shareable_bits_show(struct kernfs_open_file *of,
>  	return 0;
>  }
>  
> +/*
> + * io_alloc feature uses max CLOSID to route the IO traffic.
> + * Get the max CLOSID and verify if the CLOSID is available.
> + */
> +static int resctrl_io_alloc_closid_get(struct rdt_resource *r,
> +				       struct resctrl_schema *s)
> +{
> +	int num_closids = resctrl_arch_get_num_closid(r);
> +
> +	/*
> +	 * The number of CLOSIDs is determined based on the minimum
> +	 * supported across all resources (in closid_init). It is stored

closid_init -> closid_init()

> +	 * in s->num_closids. Also, if CDP is enabled number of CLOSIDs
> +	 * are halved. To enable io_alloc feature, the number of CLOSIDs
> +	 * must match the maximum CLOSID supported by the resource.
> +	 */
> +	if (resctrl_arch_get_cdp_enabled(r->rid))
> +		num_closids /= 2;
> +
> +	if (s->num_closid != num_closids) {

Considering from schemata_list_add():
	s->num_closid = resctrl_arch_get_num_closid(r);

... the above "if (s->num_closid != num_closids)" just compares the value to itself, no?

This function does not actually take all resources into account with the above
comparison. I think what you may need here is a comparison with closid_free_map_len?

As I understand it is still possible to use io_alloc when the resource's max CLOSID
is not within closid_free_map, this is just not done simplify implementation.

> +		rdt_last_cmd_puts("Max CLOSID to support io_alloc is not available\n");
> +		return -ENOSPC;
> +	}
> +
> +	return num_closids - 1;
> +}
> +
>  /*
>   * rdt_bit_usage_show - Display current usage of resources
>   *
> @@ -1038,6 +1079,14 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,
>  		for (i = 0; i < closids_supported(); i++) {
>  			if (!closid_allocated(i))
>  				continue;
> +			/*
> +			 * If io_alloc is enabled, the CLOSID will be
> +			 * allocated but will not be associated with any
> +			 * groups. Skip in that case.

This defeats the purpose of "bit_usage" that gives insight to user space
on how the cache is allocated. Instead of ignoring portions of cache
used for I/O this should display to the user that these portions are
used by/shared with hardware.

> +			 */
> +			if (i == resctrl_io_alloc_closid_get(r, s) &&
> +			    resctrl_arch_get_io_alloc_enabled(r))
> +				continue;
>  			ctrl_val = resctrl_arch_get_config(r, dom, i,
>  							   s->conf_type);
>  			mode = rdtgroup_mode_by_closid(i);
> @@ -1830,6 +1879,94 @@ int resctrl_arch_io_alloc_enable(struct rdt_resource *r, bool enable)
>  	return 0;
>  }
>  
> +static int resctrl_io_alloc_show(struct kernfs_open_file *of,
> +				 struct seq_file *seq, void *v)
> +{
> +	struct resctrl_schema *s = of->kn->parent->priv;
> +	struct rdt_resource *r = s->res;
> +
> +	seq_printf(seq, "%x\n", resctrl_arch_get_io_alloc_enabled(r));
> +	return 0;
> +}
> +
> +/*
> + * Initialize io_alloc CLOSID cache resource with default CBM values.
> + */
> +static int resctrl_io_alloc_init_cat(struct rdt_resource *r,
> +				     struct resctrl_schema *s, u32 closid)
> +{
> +	int ret;
> +
> +	rdt_staged_configs_clear();
> +
> +	ret = rdtgroup_init_cat(s, closid);
> +	if (ret < 0)
> +		goto out_init_cat;
> +
> +	ret = resctrl_arch_update_domains(r, closid);
> +
> +out_init_cat:
> +	rdt_staged_configs_clear();
> +	return ret;
> +}
> +
> +static ssize_t resctrl_io_alloc_write(struct kernfs_open_file *of, char *buf,
> +				      size_t nbytes, loff_t off)
> +{
> +	struct resctrl_schema *s = of->kn->parent->priv;
> +	struct rdt_resource *r = s->res;
> +	u32 io_alloc_closid;
> +	bool enable;
> +	int ret;
> +
> +	if (!r->cache.io_alloc_capable || s->conf_type == CDP_DATA) {
> +		rdt_last_cmd_puts("io_alloc feature is not supported on the resource\n");

rdt_last_cmd_puts() starts with lockdep_assert_held(&rdtgroup_mutex), also expect
rdt_last_cmd_clear() before first use.


> +		return -EINVAL;

Could ENODEV be used instead?

> +	}
> +
> +	ret = kstrtobool(buf, &enable);
> +	if (ret)
> +		return ret;
> +
> +	cpus_read_lock();
> +	mutex_lock(&rdtgroup_mutex);
> +
> +	rdt_last_cmd_clear();
> +
> +	io_alloc_closid = resctrl_io_alloc_closid_get(r, s);
> +	if (io_alloc_closid < 0) {

Could you please add an informative message in last_cmd_status? It may be
possible for user to remedy this and retry.

> +		ret = -EINVAL;
> +		goto out_io_alloc;
> +	}
> +
> +	if (resctrl_arch_get_io_alloc_enabled(r) != enable) {
> +		if (enable) {
> +			ret = resctrl_io_alloc_closid_alloc(io_alloc_closid);
> +			if (ret < 0) {
> +				rdt_last_cmd_puts("CLOSID for io_alloc is not available\n");

If the CLOSID is not available then it may be possible for the user to remedy this by
removing a resource group and retry this operation. Since CLOSID is not useful to user space
(and x86 architecture specific) this could  be improved to give guidance to user
space about which resource group (by name, not CLOSID) is preventing this from succeeding.

(this sounded familiar, looks like I provided the same feedback to V2, to which you
responded "Yes. We can do that.")

> +				goto out_io_alloc;
> +			}
> +			ret = resctrl_io_alloc_init_cat(r, s, io_alloc_closid);
> +			if (ret) {
> +				rdt_last_cmd_puts("Failed to initialize io_alloc allocations\n");
> +				resctrl_io_alloc_closid_free(io_alloc_closid);
> +				goto out_io_alloc;
> +			}
> +
> +		} else {
> +			resctrl_io_alloc_closid_free(io_alloc_closid);
> +		}
> +
> +		ret = resctrl_arch_io_alloc_enable(r, enable);
> +	}
> +
> +out_io_alloc:
> +	mutex_unlock(&rdtgroup_mutex);
> +	cpus_read_unlock();
> +
> +	return ret ?: nbytes;
> +}
> +
>  /* rdtgroup information files for one cache resource. */
>  static struct rftype res_common_files[] = {
>  	{
> @@ -1982,6 +2119,13 @@ static struct rftype res_common_files[] = {
>  		.seq_show	= rdtgroup_schemata_show,
>  		.fflags		= RFTYPE_CTRL_BASE,
>  	},
> +	{
> +		.name		= "io_alloc",
> +		.mode		= 0644,
> +		.kf_ops		= &rdtgroup_kf_single_ops,
> +		.seq_show	= resctrl_io_alloc_show,
> +		.write		= resctrl_io_alloc_write,
> +	},
>  	{
>  		.name		= "mba_MBps_event",
>  		.mode		= 0644,

Reinette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ