[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <41d612f1-cee1-4682-960d-61ab14d31f89@amd.com>
Date: Mon, 7 Apr 2025 15:17:44 -0500
From: "Moger, Babu" <babu.moger@....com>
To: Reinette Chatre <reinette.chatre@...el.com>, tglx@...utronix.de,
mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com
Cc: x86@...nel.org, hpa@...or.com, akpm@...ux-foundation.org,
paulmck@...nel.org, thuth@...hat.com, rostedt@...dmis.org,
xiongwei.song@...driver.com, pawan.kumar.gupta@...ux.intel.com,
jpoimboe@...nel.org, daniel.sneddon@...ux.intel.com,
thomas.lendacky@....com, perry.yuan@....com, sandipan.das@....com,
kai.huang@...el.com, seanjc@...gle.com, xin3.li@...el.com,
ebiggers@...gle.com, andrew.cooper3@...rix.com, mario.limonciello@....com,
tan.shaopeng@...itsu.com, james.morse@....com, tony.luck@...el.com,
peternewman@...gle.com, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, eranian@...gle.com, corbet@....net
Subject: Re: [PATCH v3 5/7] x86/resctrl: Add interface to enable/disable
io_alloc feature
Hi Reinette,
On 3/21/25 17:58, Reinette Chatre wrote:
> Hi Babu,
>
> On 1/30/25 1:20 PM, Babu Moger wrote:
>> The io_alloc feature in resctrl is a mechanism that enables direct
>> insertion of data from I/O devices into the L3 cache.
>>
>> On AMD systems, io_alloc feature is backed by SDCIAE (L3 Smart Data Cache
>> Injection Allocation Enforcement). When enabled, SDCIAE forces all SDCI
>> lines to be placed into the L3 cache partitions identified by the
>> highest-supported L3_MASK_n register as reported by CPUID
>> Fn0000_0010_EDX_x1.MAX_COS. For example, if MAX_COS=15, SDCI lines will
>> be allocated into the L3 cache partitions determined by the bitmask in
>> the L3_MASK_15 register.
>>
>> When CDP is enabled, io_alloc routes I/O traffic using the highest CLOSID
>> allocated for the instruction cache.
>
> You can append a "L3CODE" to the above to help provide context on what resource
> is referred to as "instruction cache".
Sure.
>
>>
>> Introduce interface to enable/disable "io_alloc" feature on user input.
>>
>> Signed-off-by: Babu Moger <babu.moger@....com>
>> ---
>> v3: Rewrote the change to make it generic.
>> Rewrote the documentation in resctrl.rst to be generic and added
>> AMD feature details in the end.
>> Added the check to verify if MAX CLOSID availability on the system.
>> Added CDP check to make sure io_alloc is configured in CDP_CODE.
>> Added resctrl_io_alloc_closid_free() to free the io_alloc CLOSID.
>> Added errors in few cases when CLOSID allocation fails.
>> Fixes splat reported when info/L3/bit_usage is accesed when io_alloc
>> is enabled.
>> https://lore.kernel.org/lkml/SJ1PR11MB60837B532254E7B23BC27E84FC052@SJ1PR11MB6083.namprd11.prod.outlook.com/
>>
>> v2: Renamed the feature to "io_alloc".
>> Added generic texts for the feature in commit log and resctrl.rst doc.
>> Added resctrl_io_alloc_init_cat() to initialize io_alloc to default
>> values when enabled.
>> Fixed io_alloc show functinality to display only on L3 resource.
>> ---
>> Documentation/arch/x86/resctrl.rst | 34 ++++++
>> arch/x86/kernel/cpu/resctrl/core.c | 2 +
>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 144 +++++++++++++++++++++++++
>> 3 files changed, 180 insertions(+)
>>
>> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
>> index 6768fc1fad16..1b67e31d626c 100644
>> --- a/Documentation/arch/x86/resctrl.rst
>> +++ b/Documentation/arch/x86/resctrl.rst
>> @@ -135,6 +135,40 @@ related to allocation:
>> "1":
>> Non-contiguous 1s value in CBM is supported.
>>
>> +"io_alloc":
>> + The "io_alloc" feature in resctrl enables system software to
>
> Since this is already resctrl documentation "feature in resctrl" could be dropped to
> be just:
> "io_alloc" enables system software ...
Sure.
>
>
>> + configure the portion of the L3 cache allocated for I/O traffic.
>> + By directly caching data from I/O devices rather than first storing
>> + the I/O data in DRAM, reduces the demands on DRAM bandwidth and
>> + reduces latency to the processor consuming the I/O data.
>
> hmmm ... looks like "SDCIAE" was deleted from earlier used (marketing?) text and
> resulting text left as-is without re-checking if resulting text is still coherent.
> I do not think it is needed to motivate/market the feature here, perhaps last
> sentence can just be dropped?
Yes. I will drop the last sentence.
>
>> +
>> + The feature routes the I/O traffic via specific CLOSID reserved
>> + for io_alloc feature. By configuring the CBM (Capacity Bit Mask)
>> + for the CLOSID users can control the L3 portions available for
>> + I/O traffic. When enabled, CLOSID reserved for the io_alloc will
>> + not be available to the resctrl group.
>
> Although the above reflects how SDCIAE is implemented it may not be true for how
> another architecture may support this. hmmm ... this sounds familiar and looking back it
> is the same thing I mentioned in V2 feedback, actually, in V2 I pointed to V1 feedback
> that said this also.
> If you insist on this text then please change the tone that indicates the
> behavior is optional. For example, "An architecture may support io_alloc by reserving
> a CLOSID to configure the ..."
Yes. Sure.
>
>> + ::
>> +
>> + # cat /sys/fs/resctrl/info/L3/io_alloc
>> + 0
>
> Please refer to cover-letter about proposal to use enabled/disabled/not supported instead.
Yes. Got it.
>
>> +
>> + "0":
>> + io_alloc feature is not enabled.
>> + "1":
>> + io_alloc feature is enabled, allowing users to manage
>> + the portions of the L3 cache allocated for the I/O device.
>> +
>> + Feature can be enabled/disabled by writing to the interface.
>> + Example::
>> +
>> + # echo 1 > /sys/fs/resctrl/info/L3/io_alloc
>> +
>> + On AMD systems, the io_alloc feature is supported by the L3 Smart
>> + Data Cache Injection Allocation Enforcement (SDCIAE). The CLOSID for
>> + io_alloc is determined by the highest CLOSID supported by the resource.
>> + When CDP is enabled, io_alloc routes I/O traffic using the highest
>> + CLOSID allocated for the instruction cache.
>> +
>> Memory bandwidth(MB) subdirectory contains the following files
>> with respect to allocation:
>>
>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>> index 1ebdb2dcc009..88bc95c14ea8 100644
>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>> @@ -309,6 +309,8 @@ static void rdt_get_cdp_config(int level)
>> static void rdt_set_io_alloc_capable(struct rdt_resource *r)
>> {
>> r->cache.io_alloc_capable = true;
>> + resctrl_file_fflags_init("io_alloc",
>> + RFTYPE_CTRL_INFO | RFTYPE_RES_CACHE);
>> }
>
> Some MPAM changes landed since you created this work. After the fs/arch split the
> architecture code should have no insight into the resctrl file flags. Please refer to
> the MPAM changes on how this can be managed. You can refer to thread_throttle_mode_init()
> and similar to that resctrl can use the io_alloc_capable flag to make the files visible.
Yes. This needs change.
>
>>
>> static void rdt_get_cdp_l3_config(void)
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index c5a0a31c3a85..37295dd14abe 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -62,6 +62,7 @@ static char last_cmd_status_buf[512];
>>
>> static int rdtgroup_setup_root(struct rdt_fs_context *ctx);
>> static void rdtgroup_destroy_root(void);
>> +static int rdtgroup_init_cat(struct resctrl_schema *s, u32 closid);
>>
>> struct dentry *debugfs_resctrl;
>>
>> @@ -180,6 +181,19 @@ void closid_free(int closid)
>> __set_bit(closid, &closid_free_map);
>> }
>>
>> +static int resctrl_io_alloc_closid_alloc(u32 io_alloc_closid)
>> +{
>> + if (__test_and_clear_bit(io_alloc_closid, &closid_free_map))
>> + return io_alloc_closid;
>> + else
>> + return -ENOSPC;
>> +}
>> +
>> +static void resctrl_io_alloc_closid_free(u32 io_alloc_closid)
>> +{
>> + closid_free(io_alloc_closid);
>> +}
>> +
>> /**
>> * closid_allocated - test if provided closid is in use
>> * @closid: closid to be tested
>> @@ -995,6 +1009,33 @@ static int rdt_shareable_bits_show(struct kernfs_open_file *of,
>> return 0;
>> }
>>
>> +/*
>> + * io_alloc feature uses max CLOSID to route the IO traffic.
>> + * Get the max CLOSID and verify if the CLOSID is available.
>> + */
>> +static int resctrl_io_alloc_closid_get(struct rdt_resource *r,
>> + struct resctrl_schema *s)
>> +{
>> + int num_closids = resctrl_arch_get_num_closid(r);
>> +
>> + /*
>> + * The number of CLOSIDs is determined based on the minimum
>> + * supported across all resources (in closid_init). It is stored
>
> closid_init -> closid_init()
Sure.
>
>> + * in s->num_closids. Also, if CDP is enabled number of CLOSIDs
>> + * are halved. To enable io_alloc feature, the number of CLOSIDs
>> + * must match the maximum CLOSID supported by the resource.
>> + */
>> + if (resctrl_arch_get_cdp_enabled(r->rid))
>> + num_closids /= 2;
>> +
>> + if (s->num_closid != num_closids) {
>
> Considering from schemata_list_add():
> s->num_closid = resctrl_arch_get_num_closid(r);
>
> ... the above "if (s->num_closid != num_closids)" just compares the value to itself, no?
>
> This function does not actually take all resources into account with the above
> comparison. I think what you may need here is a comparison with closid_free_map_len?
Yea. I need to change the logic here.
The max supported CLOSID on the resource and closid_free_map_len should
match. Will fix it.
>
> As I understand it is still possible to use io_alloc when the resource's max CLOSID
> is not within closid_free_map, this is just not done simplify implementation.
That is correct.
>
>> + rdt_last_cmd_puts("Max CLOSID to support io_alloc is not available\n");
>> + return -ENOSPC;
>> + }
>> +
>> + return num_closids - 1;
>> +}
>> +
>> /*
>> * rdt_bit_usage_show - Display current usage of resources
>> *
>> @@ -1038,6 +1079,14 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,
>> for (i = 0; i < closids_supported(); i++) {
>> if (!closid_allocated(i))
>> continue;
>> + /*
>> + * If io_alloc is enabled, the CLOSID will be
>> + * allocated but will not be associated with any
>> + * groups. Skip in that case.
>
> This defeats the purpose of "bit_usage" that gives insight to user space
> on how the cache is allocated. Instead of ignoring portions of cache
> used for I/O this should display to the user that these portions are
> used by/shared with hardware.
Yes. Will change it.
>
>> + */
>> + if (i == resctrl_io_alloc_closid_get(r, s) &&
>> + resctrl_arch_get_io_alloc_enabled(r))
>> + continue;
>> ctrl_val = resctrl_arch_get_config(r, dom, i,
>> s->conf_type);
>> mode = rdtgroup_mode_by_closid(i);
>> @@ -1830,6 +1879,94 @@ int resctrl_arch_io_alloc_enable(struct rdt_resource *r, bool enable)
>> return 0;
>> }
>>
>> +static int resctrl_io_alloc_show(struct kernfs_open_file *of,
>> + struct seq_file *seq, void *v)
>> +{
>> + struct resctrl_schema *s = of->kn->parent->priv;
>> + struct rdt_resource *r = s->res;
>> +
>> + seq_printf(seq, "%x\n", resctrl_arch_get_io_alloc_enabled(r));
>> + return 0;
>> +}
>> +
>> +/*
>> + * Initialize io_alloc CLOSID cache resource with default CBM values.
>> + */
>> +static int resctrl_io_alloc_init_cat(struct rdt_resource *r,
>> + struct resctrl_schema *s, u32 closid)
>> +{
>> + int ret;
>> +
>> + rdt_staged_configs_clear();
>> +
>> + ret = rdtgroup_init_cat(s, closid);
>> + if (ret < 0)
>> + goto out_init_cat;
>> +
>> + ret = resctrl_arch_update_domains(r, closid);
>> +
>> +out_init_cat:
>> + rdt_staged_configs_clear();
>> + return ret;
>> +}
>> +
>> +static ssize_t resctrl_io_alloc_write(struct kernfs_open_file *of, char *buf,
>> + size_t nbytes, loff_t off)
>> +{
>> + struct resctrl_schema *s = of->kn->parent->priv;
>> + struct rdt_resource *r = s->res;
>> + u32 io_alloc_closid;
>> + bool enable;
>> + int ret;
>> +
>> + if (!r->cache.io_alloc_capable || s->conf_type == CDP_DATA) {
>> + rdt_last_cmd_puts("io_alloc feature is not supported on the resource\n");
>
> rdt_last_cmd_puts() starts with lockdep_assert_held(&rdtgroup_mutex), also expect
> rdt_last_cmd_clear() before first use.
Yes.
>
>
>> + return -EINVAL;
>
> Could ENODEV be used instead?
Sure.
>
>> + }
>> +
>> + ret = kstrtobool(buf, &enable);
>> + if (ret)
>> + return ret;
>> +
>> + cpus_read_lock();
>> + mutex_lock(&rdtgroup_mutex);
>> +
>> + rdt_last_cmd_clear();
>> +
>> + io_alloc_closid = resctrl_io_alloc_closid_get(r, s);
>> + if (io_alloc_closid < 0) {
>
> Could you please add an informative message in last_cmd_status? It may be
> possible for user to remedy this and retry.
Yes.
>
>> + ret = -EINVAL;
>> + goto out_io_alloc;
>> + }
>> +
>> + if (resctrl_arch_get_io_alloc_enabled(r) != enable) {
>> + if (enable) {
>> + ret = resctrl_io_alloc_closid_alloc(io_alloc_closid);
>> + if (ret < 0) {
>> + rdt_last_cmd_puts("CLOSID for io_alloc is not available\n");
>
> If the CLOSID is not available then it may be possible for the user to remedy this by
> removing a resource group and retry this operation. Since CLOSID is not useful to user space
> (and x86 architecture specific) this could be improved to give guidance to user
> space about which resource group (by name, not CLOSID) is preventing this from succeeding.
Sure.
>
> (this sounded familiar, looks like I provided the same feedback to V2, to which you
> responded "Yes. We can do that.")
Yes. Clearly, I missed that.
>
>> + goto out_io_alloc;
>> + }
>> + ret = resctrl_io_alloc_init_cat(r, s, io_alloc_closid);
>> + if (ret) {
>> + rdt_last_cmd_puts("Failed to initialize io_alloc allocations\n");
>> + resctrl_io_alloc_closid_free(io_alloc_closid);
>> + goto out_io_alloc;
>> + }
>> +
>> + } else {
>> + resctrl_io_alloc_closid_free(io_alloc_closid);
>> + }
>> +
>> + ret = resctrl_arch_io_alloc_enable(r, enable);
>> + }
>> +
>> +out_io_alloc:
>> + mutex_unlock(&rdtgroup_mutex);
>> + cpus_read_unlock();
>> +
>> + return ret ?: nbytes;
>> +}
>> +
>> /* rdtgroup information files for one cache resource. */
>> static struct rftype res_common_files[] = {
>> {
>> @@ -1982,6 +2119,13 @@ static struct rftype res_common_files[] = {
>> .seq_show = rdtgroup_schemata_show,
>> .fflags = RFTYPE_CTRL_BASE,
>> },
>> + {
>> + .name = "io_alloc",
>> + .mode = 0644,
>> + .kf_ops = &rdtgroup_kf_single_ops,
>> + .seq_show = resctrl_io_alloc_show,
>> + .write = resctrl_io_alloc_write,
>> + },
>> {
>> .name = "mba_MBps_event",
>> .mode = 0644,
>
> Reinette
>
--
Thanks
Babu Moger
Powered by blists - more mailing lists