[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a08d4234-651c-4ac5-9cf8-96a4461c71ce@amd.com>
Date: Wed, 19 Jun 2024 10:37:54 -0500
From: "Moger, Babu" <babu.moger@....com>
To: Reinette Chatre <reinette.chatre@...el.com>, corbet@....net,
fenghua.yu@...el.com, tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com
Cc: x86@...nel.org, hpa@...or.com, paulmck@...nel.org, rdunlap@...radead.org,
tj@...nel.org, peterz@...radead.org, yanjiewtw@...il.com,
kim.phillips@....com, lukas.bulwahn@...il.com, seanjc@...gle.com,
jmattson@...gle.com, leitao@...ian.org, jpoimboe@...nel.org,
rick.p.edgecombe@...el.com, kirill.shutemov@...ux.intel.com,
jithu.joseph@...el.com, kai.huang@...el.com, kan.liang@...ux.intel.com,
daniel.sneddon@...ux.intel.com, pbonzini@...hat.com, sandipan.das@....com,
ilpo.jarvinen@...ux.intel.com, peternewman@...gle.com,
maciej.wieczor-retman@...el.com, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, eranian@...gle.com, james.morse@....com
Subject: Re: [PATCH v4 07/19] x86/resctrl: Add support to enable/disable ABMC
feature
Hi Reinette,
On 6/13/24 19:59, Reinette Chatre wrote:
> Hi Babu,
>
> On 5/24/24 5:23 AM, Babu Moger wrote:
>> Add the functionality to enable/disable ABMC feature.
>>
>> ABMC feature is enabled by setting enabled bit(0) in MSR L3_QOS_EXT_CFG.
>> When the state of ABMC is changed, the MSR needs to be updated on all
>> the logical processors in the QOS Domain.
>>
>> Hardware counters will reset when ABMC state is changed. Kernel internal
>> counters need to be reset to avoid overflow condition in the next update.
>
> Please note that there are two "ABMC" features introduced in this series.
> First, there is the "abmc" resctrl fs feature that just happens to have
> the same name as AMD's "ABMC" (which may be a good motivation to change
> this name). Second, there is the architecture (AMD) specific "ABMC" feature
> that is enabled in response to user's request to enable the
> resctrl "abmc" feature.
>
> Other architectures need to support resctrl fs "abmc" feature with something
> entirely different.
>
> Please consider this distinction during this series because it is often
> blurred.
I was explainaing the enablement of AMD's ABMC feature here. I can add
text about it.
>
>>
>> The ABMC feature details are documented in APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC).
>>
>> Signed-off-by: Babu Moger <babu.moger@....com>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>> ---
>> v4: Removed resctrl_arch_set_abmc_enabled and
>> resctrl_arch_get_abmc_enabled.
>> Directly calling resctrl_abmc_enable and resctrl_abmc_disable.
>> Renamed couple of functions.
>> resctrl_abmc_msrwrite() -> resctrl_abmc_set_one()
>> resctrl_abmc_setup() -> resctrl_abmc_set_all()
>> Added rdtgroup_mutex lockdep asserts.
>> Updated commit log and code comments.
>>
>> v3: No changes.
>>
>> v2: Few text changes in commit message.
>> ---
>> arch/x86/include/asm/msr-index.h | 1 +
>> arch/x86/kernel/cpu/resctrl/internal.h | 8 ++++
>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 62 ++++++++++++++++++++++++++
>> 3 files changed, 71 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/msr-index.h
>> b/arch/x86/include/asm/msr-index.h
>> index e022e6eb766c..5f9a0139e98c 100644
>> --- a/arch/x86/include/asm/msr-index.h
>> +++ b/arch/x86/include/asm/msr-index.h
>> @@ -1171,6 +1171,7 @@
>> #define MSR_IA32_MBA_BW_BASE 0xc0000200
>> #define MSR_IA32_SMBA_BW_BASE 0xc0000280
>> #define MSR_IA32_EVT_CFG_BASE 0xc0000400
>> +#define MSR_IA32_L3_QOS_EXT_CFG 0xc00003ff
>> /* MSR_IA32_VMX_MISC bits */
>> #define MSR_IA32_VMX_MISC_INTEL_PT (1ULL << 14)
>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
>> b/arch/x86/kernel/cpu/resctrl/internal.h
>> index d566251094b2..fabe40304798 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -97,6 +97,9 @@ cpumask_any_housekeeping(const struct cpumask *mask,
>> int exclude_cpu)
>> return cpu;
>> }
>> +/* Setting bit 0 in L3_QOS_EXT_CFG enables the ABMC feature */
>> +#define ABMC_ENABLE BIT(0)
>> +
>> struct rdt_fs_context {
>> struct kernfs_fs_context kfc;
>> bool enable_cdpl2;
>> @@ -436,6 +439,7 @@ struct rdt_parse_data {
>> * @mbm_cfg_mask: Bandwidth sources that can be tracked when Bandwidth
>> * Monitoring Event Configuration (BMEC) is supported.
>> * @cdp_enabled: CDP state of this resource
>> + * @abmc_enabled: ABMC feature is enabled
>> *
>> * Members of this structure are either private to the architecture
>> * e.g. mbm_width, or accessed via helpers that provide abstraction. e.g.
>> @@ -450,6 +454,7 @@ struct rdt_hw_resource {
>> unsigned int mbm_width;
>> unsigned int mbm_cfg_mask;
>> bool cdp_enabled;
>> + bool abmc_enabled;
>> };
>>
>
> ok, so here by making "abmc_enabled" a member of struct rdt_hw_resource
> this is
> an architecture specific property. This is reasonable since every
> architecture
> will look different. What is _not_ ok is that this causes the rest of the
> series to change resctrl fs to reach into the architecture code. For example,
> this work causes mbm_config_show() to now need to peek into struct
> rdt_hw_resource
> to see this value. That is not ok. All of the interactions between this
> field and resctrl fs needs to be via arch helpers:
> resctrl_arch_abmc_enable()/
> resctrl_arch_abmc_disable() and
> resctrl_arch_get_abmc_enabled()/resctrl_arch_set_abmc_enabled().
Sure. Will change it.
>
>> static inline struct rdt_hw_resource *resctrl_to_arch_res(struct
>> rdt_resource *r)
>> @@ -493,6 +498,9 @@ static inline bool resctrl_arch_get_cdp_enabled(enum
>> resctrl_res_level l)
>> int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool
>> enable);
>> +int resctrl_abmc_enable(enum resctrl_res_level l);
>> +void resctrl_abmc_disable(enum resctrl_res_level l);
>
> Why do these need enum resctrl_res_level parameter?
It is really not required. It is always "RDT_RESOURCE_L3".
I can remove it.
>
>> +
>> /*
>> * To return the common struct rdt_resource, which is contained in struct
>> * rdt_hw_resource, walk the resctrl member of struct rdt_hw_resource.
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index ca692712b393..9148d1234ede 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -2416,6 +2416,68 @@ int resctrl_arch_set_cdp_enabled(enum
>> resctrl_res_level l, bool enable)
>> return 0;
>> }
>> +static void resctrl_abmc_set_one(void *arg)
>> +{
>> + bool *enable = arg;
>> + u64 msrval;
>> +
>> + rdmsrl(MSR_IA32_L3_QOS_EXT_CFG, msrval);
>> +
>> + if (*enable)
>> + msrval |= ABMC_ENABLE;
>> + else
>> + msrval &= ~ABMC_ENABLE;
>> +
>> + wrmsrl(MSR_IA32_L3_QOS_EXT_CFG, msrval);
>> +}
>> +
>> +static int resctrl_abmc_set_all(enum resctrl_res_level l, bool enable)
>
> Should this function and resctrl_abmc_set_one() perhaps have "amd" in the
> name just to enforce that this is not filesystem code at all and specific
> and unique to AMD.
Sure.
>
>> +{
>> + struct rdt_resource *r = &rdt_resources_all[l].r_resctrl;
>> + struct rdt_domain *d;
>> +
>> + /*
>> + * Update QOS_CFG MSR on all the CPUs associated with the resource
>
> end of sentence needs "."
Sure.
>
> This comment about the specific register seems more appropriate to
> resctrl_abmc_set_one() though. This function is a higher level
> enable/disable of the hardware feature.
Ok.
>
>> + * Hardware counters will reset after switching the monotor mode.
>
> monotor -> monitor
Sure.
>
>> + * Reset the internal counters so that it is not considered as
>> + * an overflow in next update.
>
> For the first time the term "internal counters" is introduced. What does it
> mean?
How about "mbm counters"?
>
>> + */
>> + list_for_each_entry(d, &r->domains, list) {
>> + on_each_cpu_mask(&d->cpu_mask, resctrl_abmc_set_one, &enable, 1);
>> + resctrl_arch_reset_rmid_all(r, d);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +int resctrl_abmc_enable(enum resctrl_res_level l)
>> +{
>> + struct rdt_hw_resource *hw_res = &rdt_resources_all[l];
>> + int ret = 0;
>> +
>> + lockdep_assert_held(&rdtgroup_mutex);
>> +
>> + if (!hw_res->abmc_enabled) {
>> + ret = resctrl_abmc_set_all(l, true);
>> + if (!ret)
>> + hw_res->abmc_enabled = true;
>
> This error handling seems useless since resctrl_abmc_set_all() always returns
> 0 ... perhaps it should return void instead and this error handling dropped?
> With that this function can never fail either and it can just return void,
> but this is probably not what we want as the architecture call since other
> architectures may fail.
Understood.
>
>> + }
>> +
>> + return ret;
>> +}
>> +
>> +void resctrl_abmc_disable(enum resctrl_res_level l)
>> +{
>> + struct rdt_hw_resource *hw_res = &rdt_resources_all[l];
>> +
>> + lockdep_assert_held(&rdtgroup_mutex);
>> +
>> + if (hw_res->abmc_enabled) {
>> + resctrl_abmc_set_all(l, false);
>> + hw_res->abmc_enabled = false;
>> + }
>> +}
>> +
>> /*
>> * We don't allow rdtgroup directories to be created anywhere
>> * except the root directory. Thus when looking for the rdtgroup
>
> Reinette
>
--
Thanks
Babu Moger
Powered by blists - more mailing lists