lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <41d612f1-cee1-4682-960d-61ab14d31f89@amd.com>
Date: Mon, 7 Apr 2025 15:17:44 -0500
From: "Moger, Babu" <babu.moger@....com>
To: Reinette Chatre <reinette.chatre@...el.com>, tglx@...utronix.de,
 mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com
Cc: x86@...nel.org, hpa@...or.com, akpm@...ux-foundation.org,
 paulmck@...nel.org, thuth@...hat.com, rostedt@...dmis.org,
 xiongwei.song@...driver.com, pawan.kumar.gupta@...ux.intel.com,
 jpoimboe@...nel.org, daniel.sneddon@...ux.intel.com,
 thomas.lendacky@....com, perry.yuan@....com, sandipan.das@....com,
 kai.huang@...el.com, seanjc@...gle.com, xin3.li@...el.com,
 ebiggers@...gle.com, andrew.cooper3@...rix.com, mario.limonciello@....com,
 tan.shaopeng@...itsu.com, james.morse@....com, tony.luck@...el.com,
 peternewman@...gle.com, linux-doc@...r.kernel.org,
 linux-kernel@...r.kernel.org, eranian@...gle.com, corbet@....net
Subject: Re: [PATCH v3 5/7] x86/resctrl: Add interface to enable/disable
 io_alloc feature

Hi Reinette,

On 3/21/25 17:58, Reinette Chatre wrote:
> Hi Babu,
> 
> On 1/30/25 1:20 PM, Babu Moger wrote:
>> The io_alloc feature in resctrl is a mechanism that enables direct
>> insertion of data from I/O devices into the L3 cache.
>>
>> On AMD systems, io_alloc feature is backed by SDCIAE (L3 Smart Data Cache
>> Injection Allocation Enforcement). When enabled, SDCIAE forces all SDCI
>> lines to be placed into the L3 cache partitions identified by the
>> highest-supported L3_MASK_n register as reported by CPUID
>> Fn0000_0010_EDX_x1.MAX_COS. For example, if MAX_COS=15, SDCI lines will
>> be allocated into the L3 cache partitions determined by the bitmask in
>> the L3_MASK_15 register.
>>
>> When CDP is enabled, io_alloc routes I/O traffic using the highest CLOSID
>> allocated for the instruction cache.
> 
> You can append a "L3CODE" to the above to help provide context on what resource
> is referred to as "instruction cache".

Sure.

> 
>>
>> Introduce interface to enable/disable "io_alloc" feature on user input.
>>
>> Signed-off-by: Babu Moger <babu.moger@....com>
>> ---
>> v3: Rewrote the change to make it generic.
>>     Rewrote the documentation in resctrl.rst to be generic and added
>>     AMD feature details in the end.
>>     Added the check to verify if MAX CLOSID availability on the system.
>>     Added CDP check to make sure io_alloc is configured in CDP_CODE.
>>     Added resctrl_io_alloc_closid_free() to free the io_alloc CLOSID.
>>     Added errors in few cases when CLOSID allocation fails.
>>     Fixes splat reported when info/L3/bit_usage is accesed when io_alloc
>>     is enabled.
>>     https://lore.kernel.org/lkml/SJ1PR11MB60837B532254E7B23BC27E84FC052@SJ1PR11MB6083.namprd11.prod.outlook.com/
>>
>> v2: Renamed the feature to "io_alloc".
>>     Added generic texts for the feature in commit log and resctrl.rst doc.
>>     Added resctrl_io_alloc_init_cat() to initialize io_alloc to default
>>     values when enabled.
>>     Fixed io_alloc show functinality to display only on L3 resource.
>> ---
>>  Documentation/arch/x86/resctrl.rst     |  34 ++++++
>>  arch/x86/kernel/cpu/resctrl/core.c     |   2 +
>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 144 +++++++++++++++++++++++++
>>  3 files changed, 180 insertions(+)
>>
>> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
>> index 6768fc1fad16..1b67e31d626c 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -135,6 +135,40 @@ related to allocation:
>>  			"1":
>>  			      Non-contiguous 1s value in CBM is supported.
>>  
>> +"io_alloc":
>> +		The "io_alloc" feature in resctrl enables system software to
> 
> Since this is already resctrl documentation "feature in resctrl" could be dropped to
> be just:
> 		"io_alloc" enables system software ...

Sure.

> 	
> 
>> +		configure the portion of the L3 cache allocated for I/O traffic.
>> +		By directly caching data from I/O devices rather than first storing
>> +		the I/O data in DRAM, reduces the demands on DRAM bandwidth and
>> +		reduces latency to the processor consuming the I/O data.
> 
> hmmm ... looks like "SDCIAE" was deleted from earlier used (marketing?) text and
> resulting text left as-is without re-checking if resulting text is still coherent.
> I do not think it is needed to motivate/market the feature here, perhaps last
> sentence can just be dropped?

Yes. I will drop the last sentence.

> 
>> +
>> +		The feature routes the I/O traffic via specific CLOSID reserved
>> +		for io_alloc feature. By configuring the CBM (Capacity Bit Mask)
>> +		for the CLOSID users can control the L3 portions available for
>> +		I/O traffic. When enabled, CLOSID reserved for the io_alloc will
>> +		not be available to the resctrl group.
> 
> Although the above reflects how SDCIAE is implemented it may not be true for how
> another architecture may support this. hmmm ... this sounds familiar and looking back it
> is the same thing I mentioned in V2 feedback, actually, in V2 I pointed to V1 feedback
> that said this also.
> If you insist on this text then please change the tone that indicates the
> behavior is optional. For example, "An architecture may support io_alloc by reserving
> a CLOSID to configure the ..."

Yes. Sure.

> 
>> +		::
>> +
>> +		  # cat /sys/fs/resctrl/info/L3/io_alloc
>> +		  0
> 
> Please refer to cover-letter about proposal to use enabled/disabled/not supported instead.

Yes. Got it.

> 
>> +
>> +		  "0":
>> +		      io_alloc feature is not enabled.
>> +		  "1":
>> +		      io_alloc feature is enabled, allowing users to manage
>> +		      the portions of the L3 cache allocated for the I/O device.
>> +
>> +		Feature can be enabled/disabled by writing to the interface.
>> +		Example::
>> +
>> +			# echo 1 > /sys/fs/resctrl/info/L3/io_alloc
>> +
>> +		On AMD systems, the io_alloc feature is supported by the L3 Smart
>> +		Data Cache Injection Allocation Enforcement (SDCIAE). The CLOSID for
>> +		io_alloc is determined by the highest CLOSID supported by the resource.
>> +		When CDP is enabled, io_alloc routes I/O traffic using the highest
>> +		CLOSID allocated for the instruction cache.
>> +
>>  Memory bandwidth(MB) subdirectory contains the following files
>>  with respect to allocation:
>>  
>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>> index 1ebdb2dcc009..88bc95c14ea8 100644
>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>> @@ -309,6 +309,8 @@ static void rdt_get_cdp_config(int level)
>>  static void rdt_set_io_alloc_capable(struct rdt_resource *r)
>>  {
>>  	r->cache.io_alloc_capable = true;
>> +	resctrl_file_fflags_init("io_alloc",
>> +				 RFTYPE_CTRL_INFO | RFTYPE_RES_CACHE);
>>  }
> 
> Some MPAM changes landed since you created this work. After the fs/arch split the
> architecture code should have no insight into the resctrl file flags. Please refer to
> the MPAM changes on how this can be managed. You can refer to thread_throttle_mode_init()
> and similar to that resctrl can use the io_alloc_capable flag to make the files visible.

Yes. This needs change.

> 
>>  
>>  static void rdt_get_cdp_l3_config(void)
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index c5a0a31c3a85..37295dd14abe 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -62,6 +62,7 @@ static char last_cmd_status_buf[512];
>>  
>>  static int rdtgroup_setup_root(struct rdt_fs_context *ctx);
>>  static void rdtgroup_destroy_root(void);
>> +static int rdtgroup_init_cat(struct resctrl_schema *s, u32 closid);
>>  
>>  struct dentry *debugfs_resctrl;
>>  
>> @@ -180,6 +181,19 @@ void closid_free(int closid)
>>  	__set_bit(closid, &closid_free_map);
>>  }
>>  
>> +static int resctrl_io_alloc_closid_alloc(u32 io_alloc_closid)
>> +{
>> +	if (__test_and_clear_bit(io_alloc_closid, &closid_free_map))
>> +		return io_alloc_closid;
>> +	else
>> +		return -ENOSPC;
>> +}
>> +
>> +static void resctrl_io_alloc_closid_free(u32 io_alloc_closid)
>> +{
>> +	closid_free(io_alloc_closid);
>> +}
>> +
>>  /**
>>   * closid_allocated - test if provided closid is in use
>>   * @closid: closid to be tested
>> @@ -995,6 +1009,33 @@ static int rdt_shareable_bits_show(struct kernfs_open_file *of,
>>  	return 0;
>>  }
>>  
>> +/*
>> + * io_alloc feature uses max CLOSID to route the IO traffic.
>> + * Get the max CLOSID and verify if the CLOSID is available.
>> + */
>> +static int resctrl_io_alloc_closid_get(struct rdt_resource *r,
>> +				       struct resctrl_schema *s)
>> +{
>> +	int num_closids = resctrl_arch_get_num_closid(r);
>> +
>> +	/*
>> +	 * The number of CLOSIDs is determined based on the minimum
>> +	 * supported across all resources (in closid_init). It is stored
> 
> closid_init -> closid_init()

Sure.

> 
>> +	 * in s->num_closids. Also, if CDP is enabled number of CLOSIDs
>> +	 * are halved. To enable io_alloc feature, the number of CLOSIDs
>> +	 * must match the maximum CLOSID supported by the resource.
>> +	 */
>> +	if (resctrl_arch_get_cdp_enabled(r->rid))
>> +		num_closids /= 2;
>> +
>> +	if (s->num_closid != num_closids) {
> 
> Considering from schemata_list_add():
> 	s->num_closid = resctrl_arch_get_num_closid(r);
> 
> ... the above "if (s->num_closid != num_closids)" just compares the value to itself, no?
> 
> This function does not actually take all resources into account with the above
> comparison. I think what you may need here is a comparison with closid_free_map_len?

Yea. I need to change the logic here.

The max supported CLOSID on the resource and closid_free_map_len should
match. Will fix it.


> 
> As I understand it is still possible to use io_alloc when the resource's max CLOSID
> is not within closid_free_map, this is just not done simplify implementation.

That is correct.

> 
>> +		rdt_last_cmd_puts("Max CLOSID to support io_alloc is not available\n");
>> +		return -ENOSPC;
>> +	}
>> +
>> +	return num_closids - 1;
>> +}
>> +
>>  /*
>>   * rdt_bit_usage_show - Display current usage of resources
>>   *
>> @@ -1038,6 +1079,14 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,
>>  		for (i = 0; i < closids_supported(); i++) {
>>  			if (!closid_allocated(i))
>>  				continue;
>> +			/*
>> +			 * If io_alloc is enabled, the CLOSID will be
>> +			 * allocated but will not be associated with any
>> +			 * groups. Skip in that case.
> 
> This defeats the purpose of "bit_usage" that gives insight to user space
> on how the cache is allocated. Instead of ignoring portions of cache
> used for I/O this should display to the user that these portions are
> used by/shared with hardware.

Yes. Will change it.

> 
>> +			 */
>> +			if (i == resctrl_io_alloc_closid_get(r, s) &&
>> +			    resctrl_arch_get_io_alloc_enabled(r))
>> +				continue;
>>  			ctrl_val = resctrl_arch_get_config(r, dom, i,
>>  							   s->conf_type);
>>  			mode = rdtgroup_mode_by_closid(i);
>> @@ -1830,6 +1879,94 @@ int resctrl_arch_io_alloc_enable(struct rdt_resource *r, bool enable)
>>  	return 0;
>>  }
>>  
>> +static int resctrl_io_alloc_show(struct kernfs_open_file *of,
>> +				 struct seq_file *seq, void *v)
>> +{
>> +	struct resctrl_schema *s = of->kn->parent->priv;
>> +	struct rdt_resource *r = s->res;
>> +
>> +	seq_printf(seq, "%x\n", resctrl_arch_get_io_alloc_enabled(r));
>> +	return 0;
>> +}
>> +
>> +/*
>> + * Initialize io_alloc CLOSID cache resource with default CBM values.
>> + */
>> +static int resctrl_io_alloc_init_cat(struct rdt_resource *r,
>> +				     struct resctrl_schema *s, u32 closid)
>> +{
>> +	int ret;
>> +
>> +	rdt_staged_configs_clear();
>> +
>> +	ret = rdtgroup_init_cat(s, closid);
>> +	if (ret < 0)
>> +		goto out_init_cat;
>> +
>> +	ret = resctrl_arch_update_domains(r, closid);
>> +
>> +out_init_cat:
>> +	rdt_staged_configs_clear();
>> +	return ret;
>> +}
>> +
>> +static ssize_t resctrl_io_alloc_write(struct kernfs_open_file *of, char *buf,
>> +				      size_t nbytes, loff_t off)
>> +{
>> +	struct resctrl_schema *s = of->kn->parent->priv;
>> +	struct rdt_resource *r = s->res;
>> +	u32 io_alloc_closid;
>> +	bool enable;
>> +	int ret;
>> +
>> +	if (!r->cache.io_alloc_capable || s->conf_type == CDP_DATA) {
>> +		rdt_last_cmd_puts("io_alloc feature is not supported on the resource\n");
> 
> rdt_last_cmd_puts() starts with lockdep_assert_held(&rdtgroup_mutex), also expect
> rdt_last_cmd_clear() before first use.

Yes.

> 
> 
>> +		return -EINVAL;
> 
> Could ENODEV be used instead?

Sure.

> 
>> +	}
>> +
>> +	ret = kstrtobool(buf, &enable);
>> +	if (ret)
>> +		return ret;
>> +
>> +	cpus_read_lock();
>> +	mutex_lock(&rdtgroup_mutex);
>> +
>> +	rdt_last_cmd_clear();
>> +
>> +	io_alloc_closid = resctrl_io_alloc_closid_get(r, s);
>> +	if (io_alloc_closid < 0) {
> 
> Could you please add an informative message in last_cmd_status? It may be
> possible for user to remedy this and retry.

Yes.

> 
>> +		ret = -EINVAL;
>> +		goto out_io_alloc;
>> +	}
>> +
>> +	if (resctrl_arch_get_io_alloc_enabled(r) != enable) {
>> +		if (enable) {
>> +			ret = resctrl_io_alloc_closid_alloc(io_alloc_closid);
>> +			if (ret < 0) {
>> +				rdt_last_cmd_puts("CLOSID for io_alloc is not available\n");
> 
> If the CLOSID is not available then it may be possible for the user to remedy this by
> removing a resource group and retry this operation. Since CLOSID is not useful to user space
> (and x86 architecture specific) this could  be improved to give guidance to user
> space about which resource group (by name, not CLOSID) is preventing this from succeeding.

Sure.

> 
> (this sounded familiar, looks like I provided the same feedback to V2, to which you
> responded "Yes. We can do that.")

Yes. Clearly, I missed that.

> 
>> +				goto out_io_alloc;
>> +			}
>> +			ret = resctrl_io_alloc_init_cat(r, s, io_alloc_closid);
>> +			if (ret) {
>> +				rdt_last_cmd_puts("Failed to initialize io_alloc allocations\n");
>> +				resctrl_io_alloc_closid_free(io_alloc_closid);
>> +				goto out_io_alloc;
>> +			}
>> +
>> +		} else {
>> +			resctrl_io_alloc_closid_free(io_alloc_closid);
>> +		}
>> +
>> +		ret = resctrl_arch_io_alloc_enable(r, enable);
>> +	}
>> +
>> +out_io_alloc:
>> +	mutex_unlock(&rdtgroup_mutex);
>> +	cpus_read_unlock();
>> +
>> +	return ret ?: nbytes;
>> +}
>> +
>>  /* rdtgroup information files for one cache resource. */
>>  static struct rftype res_common_files[] = {
>>  	{
>> @@ -1982,6 +2119,13 @@ static struct rftype res_common_files[] = {
>>  		.seq_show	= rdtgroup_schemata_show,
>>  		.fflags		= RFTYPE_CTRL_BASE,
>>  	},
>> +	{
>> +		.name		= "io_alloc",
>> +		.mode		= 0644,
>> +		.kf_ops		= &rdtgroup_kf_single_ops,
>> +		.seq_show	= resctrl_io_alloc_show,
>> +		.write		= resctrl_io_alloc_write,
>> +	},
>>  	{
>>  		.name		= "mba_MBps_event",
>>  		.mode		= 0644,
> 
> Reinette
> 

-- 
Thanks
Babu Moger

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ