[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dc5355c6-e7a6-466e-9183-9753281961a8@arm.com>
Date: Tue, 9 Dec 2025 16:36:13 +0000
From: Ben Horgan <ben.horgan@....com>
To: Fenghua Yu <fenghuay@...dia.com>, James Morse <james.morse@....com>,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org
Cc: D Scott Phillips OS <scott@...amperecomputing.com>,
carl@...amperecomputing.com, lcherian@...vell.com,
bobo.shaobowang@...wei.com, tan.shaopeng@...itsu.com,
baolin.wang@...ux.alibaba.com, Jamie Iles <quic_jiles@...cinc.com>,
Xin Hao <xhao@...ux.alibaba.com>, peternewman@...gle.com,
dfustini@...libre.com, amitsinght@...vell.com,
David Hildenbrand <david@...nel.org>, Dave Martin <dave.martin@....com>,
Koba Ko <kobak@...dia.com>, Shanker Donthineni <sdonthineni@...dia.com>,
baisheng.gao@...soc.com, Jonathan Cameron <jonathan.cameron@...wei.com>,
Gavin Shan <gshan@...hat.com>, rohit.mathew@....com,
reinette.chatre@...el.com, Punit Agrawal <punit.agrawal@....qualcomm.com>
Subject: Re: [RFC PATCH 31/38] arm_mpam: resctrl: Update the rmid reallocation
limit
Hi Fenghua,
On 12/6/25 00:06, Fenghua Yu wrote:
> Hi, James,
>
> On 12/5/25 13:58, James Morse wrote:
>> resctrl's limbo code needs to be told when the data left in a cache is
>> small enough for the partid+pmg value to be re-allocated.
>>
>> x86 uses the cache size divided by the number of rmid users the cache
>> may have. Do the same, but for the smallest cache, and with the
>> number of partid-and-pmg users.
>>
>> Querying the cache size can't happen until after cacheinfo_sysfs_init()
>> has run, so mpam_resctrl_setup() must wait until then.
>>
>> Signed-off-by: James Morse <james.morse@....com>
>> ---
>> drivers/resctrl/mpam_resctrl.c | 54 ++++++++++++++++++++++++++++++++++
>> 1 file changed, 54 insertions(+)
>>
>> diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/
>> mpam_resctrl.c
>> index 506063bd3348..ccdf8db742c9 100644
>> --- a/drivers/resctrl/mpam_resctrl.c
>> +++ b/drivers/resctrl/mpam_resctrl.c
>> @@ -16,6 +16,7 @@
>> #include <linux/resctrl.h>
>> #include <linux/slab.h>
>> #include <linux/types.h>
>> +#include <linux/wait.h>
>> #include <asm/mpam.h>
>> @@ -58,6 +59,13 @@ static bool cdp_enabled;
>> */
>> static bool resctrl_enabled;
>> +/*
>> + * mpam_resctrl_pick_caches() needs to know the size of the caches.
>> cacheinfo
>> + * populates this from a device_initcall(). mpam_resctrl_setup() must
>> wait.
>> + */
>> +static bool cacheinfo_ready;
>> +static DECLARE_WAIT_QUEUE_HEAD(wait_cacheinfo_ready);
>> +
>> /*
>> * L3 local/total may come from different classes - what is the
>> number of MBWU
>> * 'on L3'?
>> @@ -584,6 +592,38 @@ void resctrl_arch_reset_cntr(struct rdt_resource
>> *r, struct rdt_mon_domain *d,
>> reset_mon_cdp_safe(mon, mon_comp, USE_PRE_ALLOCATED, closid, rmid);
>> }
>> +/*
>> + * The rmid realloc threshold should be for the smallest cache
>> exposed to
>> + * resctrl.
>> + */
>> +static int update_rmid_limits(struct mpam_class *class)
>> +{
>> + u32 num_unique_pmg = resctrl_arch_system_num_rmid_idx();
>> + struct mpam_props *cprops = &class->props;
>> + struct cacheinfo *ci;
>> +
>> + lockdep_assert_cpus_held();
>> +
>> + /* Assume cache levels are the same size for all CPUs... */
>> + ci = get_cpu_cacheinfo_level(smp_processor_id(), class->level);
>> + if (!ci || ci->size == 0) {
>> + pr_debug("Could not read cache size for class %u\n",
>> + class->level);
>> + return -EINVAL;
>> + }
>> +
>> + if (!mpam_has_feature(mpam_feat_msmon_csu, cprops))
>> + return 0;
>> +
>> + if (!resctrl_rmid_realloc_limit ||
>> + ci->size < resctrl_rmid_realloc_limit) {
>> + resctrl_rmid_realloc_limit = ci->size;
>> + resctrl_rmid_realloc_threshold = ci->size / num_unique_pmg;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> static bool cache_has_usable_cpor(struct mpam_class *class)
>> {
>> struct mpam_props *cprops = &class->props;
>> @@ -1025,6 +1065,9 @@ static void mpam_resctrl_pick_counters(void)
>> /* CSU counters only make sense on a cache. */
>> switch (class->type) {
>> case MPAM_CLASS_CACHE:
>> + if (update_rmid_limits(class))
>> + continue;
>> +
>> counter_update_class(QOS_L3_OCCUP_EVENT_ID, class);
>> return;
>> default:
>> @@ -1731,6 +1774,8 @@ int mpam_resctrl_setup(void)
>> struct mpam_resctrl_res *res;
>> struct mpam_resctrl_mon *mon;
>> + wait_event(wait_cacheinfo_ready, cacheinfo_ready);
>
> This may cause system hang for any hw/fw issue that causes cacheinfo
> failure. Instead of hang, is it better to have a timeout wait here? Like
> errowait_event_timeout(wait_cache_info_read, cacheinfo_ready, 5 * HZ);
> and report failure when cacheinfo is not ready.
This is just waiting on everything in device_initcall(). I think we've
got bigger problems if that doesn't finish.
>
> Thanks.
>
> -Fenghua
Thanks,
Ben
Powered by blists - more mailing lists