[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <dd0342ae-7474-44c4-ac80-d12b69f093c7@intel.com>
Date: Thu, 22 Aug 2024 09:33:46 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Peter Newman <peternewman@...gle.com>, Fenghua Yu <fenghua.yu@...el.com>
CC: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
<x86@...nel.org>, "H . Peter Anvin" <hpa@...or.com>, Shaopeng Tan
<tan.shaopeng@...itsu.com>, James Morse <james.morse@....com>, Babu Moger
<babu.moger@....com>, Tony Luck <tony.luck@...el.com>, Maciej Wieczor-Retman
<maciej.wieczor-retman@...el.com>, <linux-kernel@...r.kernel.org>,
<eranian@...gle.com>, <irogers@...gle.com>, <namhyung@...gle.com>
Subject: Re: [PATCH] x86/resctrl: Fix arch_mbm_* array overrun on SNC
Hi Peter,
On 8/1/24 11:16 AM, Reinette Chatre wrote:
> On 7/22/24 1:46 PM, Peter Newman wrote:
>> When using resctrl on systems with Sub-NUMA Clustering enabled,
>> monitoring groups may be allocated RMID values which would overrun the
>> arch_mbm_{local,total} arrays.
>>
>> This is due to inconsistencies in whether the SNC-adjusted num_rmid
>> value or the unadjusted value in resctrl_arch_system_num_rmid_idx() is
>> used. The num_rmid value for the L3 resource is currently:
>>
>> resctrl_arch_system_num_rmid_idx() / snc_nodes_per_l3_cache
>>
>> As a simple fix, make resctrl_arch_system_num_rmid_idx() return the
>> SNC-adjusted, L3 num_rmid value on x86.
>>
>
> Thank you very much for finding, root-causing, and providing a fix for
> the issue.
>
>> Fixes: e13db55b5a0d ("x86/resctrl: Introduce snc_nodes_per_l3_cache")
>> Signed-off-by: Peter Newman <peternewman@...gle.com>
>> ---
>> arch/x86/include/asm/resctrl.h | 6 ------
>> arch/x86/kernel/cpu/resctrl/core.c | 8 ++++++++
>> include/linux/resctrl.h | 3 +++
>> 3 files changed, 11 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
>> index 12dbd2588ca7..8b1b6ce1e51b 100644
>> --- a/arch/x86/include/asm/resctrl.h
>> +++ b/arch/x86/include/asm/resctrl.h
>> @@ -156,12 +156,6 @@ static inline void resctrl_sched_in(struct task_struct *tsk)
>> __resctrl_sched_in(tsk);
>> }
>> -static inline u32 resctrl_arch_system_num_rmid_idx(void)
>> -{
>> - /* RMID are independent numbers for x86. num_rmid_idx == num_rmid */
>> - return boot_cpu_data.x86_cache_max_rmid + 1;
>> -}
>> -
>> static inline void resctrl_arch_rmid_idx_decode(u32 idx, u32 *closid, u32 *rmid)
>> {
>> *rmid = idx;
>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
>> index 1930fce9dfe9..8591d53c144b 100644
>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>> @@ -119,6 +119,14 @@ struct rdt_hw_resource rdt_resources_all[] = {
>> },
>> };
>> +u32 resctrl_arch_system_num_rmid_idx(void)
>> +{
>> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>> +
>> + /* RMID are independent numbers for x86. num_rmid_idx == num_rmid */
>> + return r->num_rmid;
>> +}
>> +
>> /*
>> * cache_alloc_hsw_probe() - Have to probe for Intel haswell server CPUs
>> * as they do not have CPUID enumeration support for Cache allocation.
>> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index b0875b99e811..43ac241471b3 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -248,6 +248,9 @@ struct resctrl_schema {
>> /* The number of closid supported by this resource regardless of CDP */
>> u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
>> +
>> +u32 resctrl_arch_system_num_rmid_idx(void);
>> +
>
> nit: the additional empty lines are unnecessary.
>
>> int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
>> /*
>
> | Reviewed-by: Reinette Chatre <reinette.chatre@...el.com>
>
It would be great if this fix can be included in the kernel release.
Could you please send V2 with nit and tag applied to be ready for
inclusion?
Thank you
Reinette
Powered by blists - more mailing lists