[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9ad4ae7a-2151-4d60-8303-87f38d7695d8@arm.com>
Date: Fri, 26 Sep 2025 18:52:27 +0100
From: James Morse <james.morse@....com>
To: Jonathan Cameron <jonathan.cameron@...wei.com>
Cc: linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linux-acpi@...r.kernel.org,
D Scott Phillips OS <scott@...amperecomputing.com>,
carl@...amperecomputing.com, lcherian@...vell.com,
bobo.shaobowang@...wei.com, tan.shaopeng@...itsu.com,
baolin.wang@...ux.alibaba.com, Jamie Iles <quic_jiles@...cinc.com>,
Xin Hao <xhao@...ux.alibaba.com>, peternewman@...gle.com,
dfustini@...libre.com, amitsinght@...vell.com,
David Hildenbrand <david@...hat.com>, Dave Martin <dave.martin@....com>,
Koba Ko <kobak@...dia.com>, Shanker Donthineni <sdonthineni@...dia.com>,
fenghuay@...dia.com, baisheng.gao@...soc.com, Rob Herring <robh@...nel.org>,
Rohit Mathew <rohit.mathew@....com>, Rafael Wysocki <rafael@...nel.org>,
Len Brown <lenb@...nel.org>, Lorenzo Pieralisi <lpieralisi@...nel.org>,
Hanjun Guo <guohanjun@...wei.com>, Sudeep Holla <sudeep.holla@....com>,
Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Danilo Krummrich <dakr@...nel.org>, Ben Horgan <ben.horgan@....com>
Subject: Re: [PATCH v2 08/29] arm_mpam: Add the class and component structures
for firmware described ris
Hi Jonathan,
On 11/09/2025 15:22, Jonathan Cameron wrote:
> On Wed, 10 Sep 2025 20:42:48 +0000
> James Morse <james.morse@....com> wrote:
>
>> An MSC is a container of resources, each identified by their RIS index.
>> Some RIS are described by firmware to provide their position in the system.
>> Others are discovered when the driver probes the hardware.
>>
>> To configure a resource it needs to be found by its class, e.g. 'L2'.
>> There are two kinds of grouping, a class is a set of components, which
>> are visible to user-space as there are likely to be multiple instances
>> of the L2 cache. (e.g. one per cluster or package)
>>
>> Add support for creating and destroying structures to allow a hierarchy
>> of resources to be created.
> Various minor things inline.
>
> Biggest is I think maybe just moving to explicit reference counts
> rather than using the empty list for that might end up easier to
> read. Mostly because everyone knows what a kref_put() is about.
>
> Obviously a bit pointless in practice, but I prefer not to think too
> hard.
I've spent a while playing with this - its the wrong shape for what is going on here.
This builds that tree structure during driver probing. (and adds 'unknown' entries to it
when poking the hardware). But by the time mpam_enable() is called - its basically
read-only. After that point the only 'write' that will happen is the error interrupt which
just free's everything. The deferred free from SRCU makes that safe.
(some of this will be clearer when I add the block comment about the 'phases' the driver
goes through that Dave asked for)
Making it 'reference counted' instead is pointless because callers would be get/put-ing
references - but the structure is basically read-only, and not going to go away while the
SRCU reference is held.
One of the things reference counting breaks is the devm_kzalloc() usage - as an error from
the driver probe function will free all of those regions, regardless of what the reference
count says.
I'll look to rename the existing 'get' functions so folk don't immediatly think of get/put!
('find' doesn't really cut it as it does the allocation if it doesn't 'find' anything)
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index efc4738e3b4d..c7f4981b3545 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -18,7 +18,6 @@
>> #include <linux/printk.h>
>> #include <linux/slab.h>
>> #include <linux/spinlock.h>
>> -#include <linux/srcu.h>
>
> Why does this include no longer make sense?
It gets moved to mpam_internal.h because of the srcu_struct declaration that is needed for
callers in the resctrl code to walk the classes list.
I can add it to the mpam_internal.h header from the beginning.
>> @@ -31,7 +30,7 @@
>> static DEFINE_MUTEX(mpam_list_lock);
>> static LIST_HEAD(mpam_all_msc);
>>
>> -static struct srcu_struct mpam_srcu;
>> +struct srcu_struct mpam_srcu;
>
> ...
>
>> +/* List of all objects that can be free()d after synchronise_srcu() */
>> +static LLIST_HEAD(mpam_garbage);
>> +
>> +#define init_garbage(x) init_llist_node(&(x)->garbage.llist)
>
> Whilst this obviously works, I'd rather pass garbage to init_garbage
> instead of the containing structure (where type varies)
Sure,
>> +static struct mpam_component *
>> +mpam_component_get(struct mpam_class *class, int id)
>> +{
>> + struct mpam_component *comp;
>> +
>> + lockdep_assert_held(&mpam_list_lock);
>> +
>> + list_for_each_entry(comp, &class->components, class_list) {
>> + if (comp->comp_id == id)
>> + return comp;
>> + }
>> +
>> + return mpam_component_alloc(class, id);
>> +}
>
>> +static struct mpam_class *
>> +mpam_class_get(u8 level_idx, enum mpam_class_types type)
>> +{
>> + bool found = false;
>> + struct mpam_class *class;
>> +
>> + lockdep_assert_held(&mpam_list_lock);
>> +
>> + list_for_each_entry(class, &mpam_classes, classes_list) {
>> + if (class->type == type && class->level == level_idx) {
>> + found = true;
>> + break;
>> + }
>> + }
>> +
>> + if (found)
>> + return class;
>
> Maybe this gets more complex later, but if it doesn't, just return class where you set
> found above. Matches with pattern used in mpam_component_get() above.
Yup, this is a relic of more complex locking.
>> +
>> + return mpam_class_alloc(level_idx, type);
>> +}
>
>
>> +static void mpam_ris_destroy(struct mpam_msc_ris *ris)
>> +{
>> + struct mpam_vmsc *vmsc = ris->vmsc;
>> + struct mpam_msc *msc = vmsc->msc;
>> + struct platform_device *pdev = msc->pdev;
>> + struct mpam_component *comp = vmsc->comp;
>> + struct mpam_class *class = comp->class;
>> +
>> + lockdep_assert_held(&mpam_list_lock);
>> +
>> + /*
>> + * It is assumed affinities don't overlap. If they do the class becomes
>> + * unusable immediately.
>> + */
>> + cpumask_andnot(&comp->affinity, &comp->affinity, &ris->affinity);
>> + cpumask_andnot(&class->affinity, &class->affinity, &ris->affinity);
>> + clear_bit(ris->ris_idx, &msc->ris_idxs);
>> + list_del_rcu(&ris->vmsc_list);
>> + list_del_rcu(&ris->msc_list);
>> + add_to_garbage(ris);
>> + ris->garbage.pdev = pdev;
>> +
>> + if (list_empty(&vmsc->ris))
> See below. I think it is worth using an explicit reference count even
> though that will only reach zero when the list is empty.
By the time eveything is probed, its an almost entirely read-only structure. The only
writer will free absolutely everything. Changing this to structure-by-structure reference
counting will cause busy-work for readers, who are already protected by SRCU.
>> + mpam_vmsc_destroy(vmsc);
>> +}
>
>
>> +static int mpam_ris_create_locked(struct mpam_msc *msc, u8 ris_idx,
>> + enum mpam_class_types type, u8 class_id,
>> + int component_id)
>> +{
>> + int err;
>> + struct mpam_vmsc *vmsc;
>> + struct mpam_msc_ris *ris;
>> + struct mpam_class *class;
>> + struct mpam_component *comp;
>> +
>> + lockdep_assert_held(&mpam_list_lock);
>> +
>> + if (ris_idx > MPAM_MSC_MAX_NUM_RIS)
>> + return -EINVAL;
>> +
>> + if (test_and_set_bit(ris_idx, &msc->ris_idxs))
>> + return -EBUSY;
>> +
>> + ris = devm_kzalloc(&msc->pdev->dev, sizeof(*ris), GFP_KERNEL);
>> + if (!ris)
>> + return -ENOMEM;
>> + init_garbage(ris);
>> +
>> + class = mpam_class_get(class_id, type);
>> + if (IS_ERR(class))
>> + return PTR_ERR(class);
>> +
>> + comp = mpam_component_get(class, component_id);
>> + if (IS_ERR(comp)) {
>> + if (list_empty(&class->components))
>> + mpam_class_destroy(class);
>
> Maybe just reference count the classes with a kref and do a put here?
>
>> + return PTR_ERR(comp);
>> + }
>> +
>> + vmsc = mpam_vmsc_get(comp, msc);
>> + if (IS_ERR(vmsc)) {
>> + if (list_empty(&comp->vmsc))
>> + mpam_comp_destroy(comp);
> Similar to classes I wonder if simple reference counting is cleaner.
(I spent a few days on it - its prettier, but more work for things like the resctrl code
to get/put references when SRCU already has the write side covered)
>> + return PTR_ERR(vmsc);
>> + }
>> +
>> + err = mpam_ris_get_affinity(msc, &ris->affinity, type, class, comp);
>> + if (err) {
>> + if (list_empty(&vmsc->ris))
>
> and here as well.
>
>> + mpam_vmsc_destroy(vmsc);
>> + return err;
>> + }
>> +
>> + ris->ris_idx = ris_idx;
>> + INIT_LIST_HEAD_RCU(&ris->vmsc_list);
>> + ris->vmsc = vmsc;
>> +
>> + cpumask_or(&comp->affinity, &comp->affinity, &ris->affinity);
>> + cpumask_or(&class->affinity, &class->affinity, &ris->affinity);
>> + list_add_rcu(&ris->vmsc_list, &vmsc->ris);
>> +
>> + return 0;
>> +}
>
>> /*
>> * An MSC can control traffic from a set of CPUs, but may only be accessible
>> * from a (hopefully wider) set of CPUs. The common reason for this is power
>> @@ -74,10 +469,10 @@ static void mpam_msc_drv_remove(struct platform_device *pdev)
>> return;
>>
>> mutex_lock(&mpam_list_lock);
>> - platform_set_drvdata(pdev, NULL);
>> - list_del_rcu(&msc->all_msc_list);
>> - synchronize_srcu(&mpam_srcu);
>> + mpam_msc_destroy(msc);
>
> I'd suggest introducing mpam_msc_destroy() in the earlier patch. Doesn't make a huge
> difference but 2 out of 3 things removed here would be untouched if you do that.
Sure. This is remnant of a patch from Carl that moved all this around.
Thanks,
James
Powered by blists - more mailing lists