[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fcad500b-3d8c-4003-b25c-6f54d2b5fbe6@arm.com>
Date: Fri, 28 Feb 2025 19:55:26 +0000
From: James Morse <james.morse@....com>
To: Reinette Chatre <reinette.chatre@...el.com>, x86@...nel.org,
linux-kernel@...r.kernel.org
Cc: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>, H Peter Anvin <hpa@...or.com>,
Babu Moger <Babu.Moger@....com>, shameerali.kolothum.thodi@...wei.com,
D Scott Phillips OS <scott@...amperecomputing.com>,
carl@...amperecomputing.com, lcherian@...vell.com,
bobo.shaobowang@...wei.com, tan.shaopeng@...itsu.com,
baolin.wang@...ux.alibaba.com, Jamie Iles <quic_jiles@...cinc.com>,
Xin Hao <xhao@...ux.alibaba.com>, peternewman@...gle.com,
dfustini@...libre.com, amitsinght@...vell.com,
David Hildenbrand <david@...hat.com>, Rex Nie <rex.nie@...uarmicro.com>,
Dave Martin <dave.martin@....com>, Koba Ko <kobak@...dia.com>,
Shanker Donthineni <sdonthineni@...dia.com>,
Shaopeng Tan <tan.shaopeng@...fujitsu.com>, Tony Luck <tony.luck@...el.com>
Subject: Re: [PATCH v6 08/42] x86/resctrl: Generate default_ctrl instead of
sharing it
Hi Reinette,
On 19/02/2025 22:54, Reinette Chatre wrote:
> On 2/7/25 10:17 AM, James Morse wrote:
>> The struct rdt_resource default_ctrl is used by both the architecture
>> code for resetting the hardware controls, and sometimes by the
>> filesystem code as the default value for the schema, unless the
>> bandwidth software controller is in use.
>>
>> Having the default exposed by the architecture code causes unnecessary
>> duplication for each architecture as the default value must be specified,
>> but can be derived from other schema properties. Now that the
>> maximum bandwidth is explicitly described, resctrl can derive the default
>> value from the schema format and the other resource properties.
>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>> index 4504a12efc97..6fd195b600b1 100644
>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>> @@ -281,8 +280,7 @@ static void rdt_get_cache_alloc_cfg(int idx, struct rdt_resource *r)
>> cpuid_count(0x00000010, idx, &eax.full, &ebx, &ecx.full, &edx.full);
>> hw_res->num_closid = edx.split.cos_max + 1;
>> r->cache.cbm_len = eax.split.cbm_len + 1;
>> - r->default_ctrl = BIT_MASK(eax.split.cbm_len + 1) - 1;
>> - r->cache.shareable_bits = ebx & r->default_ctrl;
>> + r->cache.shareable_bits = ebx & resctrl_get_default_ctrl(r);
>> if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
>> r->cache.arch_has_sparse_bitmasks = ecx.split.noncont;
>> r->alloc_capable = true;
> Using resctrl_get_default_ctrl() in the architecture code like this seems awkward in
> the way that the caller depends on resctrl_get_default_ctrl() returning a bitmask, thus
> requiring caller to be familiar with internals of function called.
resctrl and the arch code that provides the interface are closely coupled, so I don't
think its a problem to have to know what the call does...
Using the helper here was just because the memory for the default value has gone away,
I agree that as this is being used to manipulating stuff from cpuid, it should probably be
open coded here. I'll drop this hunk, (adding default_ctrl as a local variable here).
>> @@ -329,7 +327,7 @@ static u32 delay_bw_map(unsigned long bw, struct rdt_resource *r)
>> return MAX_MBA_BW - bw;
>>
>> pr_warn_once("Non Linear delay-bw map not supported but queried\n");
>> - return r->default_ctrl;
>> + return resctrl_get_default_ctrl(r);
> I wonder if returning MAX_MBA_BW directly would not be more appropriate here ...
> or returning r->membw.max_bw and doing so in previous patch.
My thinking was this value is at least in the correct format for the hardware if an AMD
platform manages to get in here - but its only called from the Intel helpers, so that
can't happen.
Using MAX_MBA_BW makes it clearer what the 'Non Linear' warning is about.
>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> index 23a01eaebd58..5d87f279085f 100644
>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> @@ -113,8 +113,9 @@ static int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,
>> */
>> static bool cbm_validate(char *buf, u32 *data, struct rdt_resource *r)
>> {
>> - unsigned long first_bit, zero_bit, val;
>> + u32 supported_bits = BIT_MASK(r->cache.cbm_len) - 1;
> What is criteria for caller to decide between using resctrl_get_default_ctrl() or
> computing the bitmask self? Most callers already seem to be using
> resctrl_get_default_ctrl() with clear expectation that it will return
> a bitmask or not so it is not obvious why some callers needing bitmask
> use resctrl_get_default_ctrl() while this caller compute bitmask self.
This one case is odd because it also needs to know the number of bits in the bitmap. I
felt computing the bitmap directly here made it clearer this was checking against a single
set of properties. (if you diagree - lets change it!)
resctrl_get_default_ctrl() is largely intended to fit callers like reset_all_ctrls() which
don't know or care what the format is, as long as it matches what the hardware expects.
Most other cases are intending to parse or format the value, so they already know what
kind of thing its going to be.
>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 1e0bae1a9d95..cd8f65c12124 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -978,7 +978,7 @@ static int rdt_default_ctrl_show(struct kernfs_open_file *of,
>> struct resctrl_schema *s = of->kn->parent->priv;
>> struct rdt_resource *r = s->res;
>>
>> - seq_printf(seq, "%x\n", r->default_ctrl);
>> + seq_printf(seq, "%x\n", resctrl_get_default_ctrl(r));
>> return 0;
>> }
> While the function is "rdt_default_ctrl_show()" the file is "cbm_mask"
> and so here also resctrl_get_default_ctrl() is implicitly assumed to
> return only a bitmask.
Because the file is hidden behind RFTYPE_RES_CACHE, which can only be set on the L2 or L3
if they have bitmask controls. If we ever support other kinds of controls, we'd need to do
somthing about this.
I agree the existing name is misleading, but I don't think its worth the extra patch to
rename it. (Curious that it isn't called rdt_cbm_mask_show())
For the distant future, I have patches that propose adding a file to expose the schema
type to resctrl, then a bunch of files prefixed with "bitmap_" or "percent_" that describe
the control properties. What we have today is a mix of control type and resource all in
one - what does the "cbm_mask" mean if the resource is not a cache?
(it goes without saying that the existing files must stay with their same values)
>> @@ -3417,7 +3417,7 @@ static void rdtgroup_init_mba(struct rdt_resource *r, u32 closid)
>> }
>>
>> cfg = &d->staged_config[CDP_NONE];
>> - cfg->new_ctrl = r->default_ctrl;
>> + cfg->new_ctrl = resctrl_get_default_ctrl(r);
>> cfg->have_new_ctrl = true;
>> }
>> }
> Using resctrl_get_default_ctrl() only seems appropriate when setting or staging
> the register values where the value returned is not further manipulated with
> assumptions regarding its format.
Provided the schema format has already been checked, I agree. This is what led to
cbm_validate() generating the bitmap values itself. (I take your point about the cpuid
interactions above)
In rdtgroup_init_mba() we could reach in to retrieve r->membw.max_bw, but the value isn't
being parsed or formatted, so I don't think it matters. Once we have a helper, we may as
well use it everywhere we can.
Thanks,
James
Powered by blists - more mailing lists