[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aLsSZWUlbrEzbx6O@e133380.arm.com>
Date: Fri, 5 Sep 2025 17:40:05 +0100
From: Dave Martin <Dave.Martin@....com>
To: James Morse <james.morse@....com>
Cc: linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linux-acpi@...r.kernel.org, devicetree@...r.kernel.org,
D Scott Phillips OS <scott@...amperecomputing.com>,
carl@...amperecomputing.com, lcherian@...vell.com,
bobo.shaobowang@...wei.com, tan.shaopeng@...itsu.com,
baolin.wang@...ux.alibaba.com, Jamie Iles <quic_jiles@...cinc.com>,
Xin Hao <xhao@...ux.alibaba.com>, peternewman@...gle.com,
dfustini@...libre.com, amitsinght@...vell.com,
David Hildenbrand <david@...hat.com>,
Rex Nie <rex.nie@...uarmicro.com>, Koba Ko <kobak@...dia.com>,
Shanker Donthineni <sdonthineni@...dia.com>, fenghuay@...dia.com,
baisheng.gao@...soc.com,
Jonathan Cameron <jonathan.cameron@...wei.com>,
Rob Herring <robh@...nel.org>, Rohit Mathew <rohit.mathew@....com>,
Rafael Wysocki <rafael@...nel.org>, Len Brown <lenb@...nel.org>,
Lorenzo Pieralisi <lpieralisi@...nel.org>,
Hanjun Guo <guohanjun@...wei.com>,
Sudeep Holla <sudeep.holla@....com>,
Krzysztof Kozlowski <krzk+dt@...nel.org>,
Conor Dooley <conor+dt@...nel.org>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Danilo Krummrich <dakr@...nel.org>,
Lecopzer Chen <lecopzerc@...dia.com>
Subject: Re: [PATCH 14/33] arm_mpam: Add cpuhp callbacks to probe MSC hardware
Hi James,
On Fri, Aug 22, 2025 at 03:29:55PM +0000, James Morse wrote:
> Because an MSC can only by accessed from the CPUs in its cpu-affinity
> set we need to be running on one of those CPUs to probe the MSC
> hardware.
>
> Do this work in the cpuhp callback. Probing the hardware will only
> happen before MPAM is enabled, walk all the MSCs and probe those we can
> reach that haven't already been probed.
It may be worth mentioning that the low-level MSC register accessors
are added by this patch.
> Later once MPAM is enabled, this cpuhp callback will be replaced by
> one that avoids the global list.
I misread this is as meaning "later in the patch series" and got
confused.
Perhaps, something like the following? (though this got a bit verbose)
--8<--
Once all MSCs reported by the firmware have been probed from a CPU in
their respective cpu-affinity set, the probe-time cpuhp callbacks are
replaced. The replacement callbacks will ultimately need to handle
save/restore of the runtime MSC state across power transitions, but for
now there is nothing to do in them: so do nothing.
-->8--
> Enabling a static key will also take the cpuhp lock, so can't be done
What static key?
None in this patch that I can see.
> from the cpuhp callback. Whenever a new MSC has been probed schedule
> work to test if all the MSCs have now been probed.
>
> CC: Lecopzer Chen <lecopzerc@...dia.com>
> Signed-off-by: James Morse <james.morse@....com>
> ---
> drivers/resctrl/mpam_devices.c | 144 +++++++++++++++++++++++++++++++-
> drivers/resctrl/mpam_internal.h | 8 +-
> 2 files changed, 147 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
[...]
> @@ -511,9 +539,84 @@ int mpam_ris_create(struct mpam_msc *msc, u8 ris_idx,
[...]
> +static int mpam_cpu_online(unsigned int cpu)
> {
> - pr_err("Discovered all MSC\n");
I guess this disappears later?
If we print anything, it feels like it should be in the
mpam_enable_once() path, otherwise it looks like dmesg is going to get
spammed on every hotplug. I might have missed something, here.
> + return 0;
> +}
> +
> +/* Before mpam is enabled, try to probe new MSC */
> +static int mpam_discovery_cpu_online(unsigned int cpu)
> +{
> + int err = 0;
> + struct mpam_msc *msc;
> + bool new_device_probed = false;
> +
> + mutex_lock(&mpam_list_lock);
I take it nothing breaks if we sleep here?
Pending cpuhp callbacks for this CPU look to be blocked while we sleep,
at the very least.
Since this only happens during the probing phase, maybe that's not such
a big deal.
Is it likely that some late CPUs might be left offline indefinitely?
If so, we might end up doing futile work here forever.
> + list_for_each_entry(msc, &mpam_all_msc, glbl_list) {
> + if (!cpumask_test_cpu(cpu, &msc->accessibility))
> + continue;
> +
> + mutex_lock(&msc->probe_lock);
> + if (!msc->probed)
> + err = mpam_msc_hw_probe(msc);
> + mutex_unlock(&msc->probe_lock);
> +
> + if (!err)
> + new_device_probed = true;
> + else
> + break; // mpam_broken
What's the effect of returning a non-zero value to the CPU hotplug
callback dispatcher here?
Do we want to tear anything down if MPAM is unusable?
> + }
> + mutex_unlock(&mpam_list_lock);
> +
> + if (new_device_probed && !err)
> + schedule_work(&mpam_enable_work);
> +
> + return err;
> +}
> +
> +static int mpam_cpu_offline(unsigned int cpu)
> +{
> + return 0;
> +}
> +
> +static void mpam_register_cpuhp_callbacks(int (*online)(unsigned int online),
> + int (*offline)(unsigned int offline))
> +{
> + mutex_lock(&mpam_cpuhp_state_lock);
> + if (mpam_cpuhp_state) {
> + cpuhp_remove_state(mpam_cpuhp_state);
> + mpam_cpuhp_state = 0;
> + }
> +
> + mpam_cpuhp_state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "mpam:online",
> + online, offline);
> + if (mpam_cpuhp_state <= 0) {
> + pr_err("Failed to register cpuhp callbacks");
Should an error code be returned to the caller if this fails?
> + mpam_cpuhp_state = 0;
> + }
> + mutex_unlock(&mpam_cpuhp_state_lock);
> }
>
> static int mpam_dt_count_msc(void)
> @@ -772,7 +875,7 @@ static int mpam_msc_drv_probe(struct platform_device *pdev)
> }
>
> if (!err && fw_num_msc == mpam_num_msc)
> - mpam_discovery_complete();
> + mpam_register_cpuhp_callbacks(&mpam_discovery_cpu_online, NULL);
Abandon probing the MSC if this fails?
(However, the next phase of probing hangs off CPU hotplug, so it just
won't happen if the callbacks can't be registered -- but it looks like
MPAM may be left in a half-probed state. I'm not entirely convinced
that this matters if the MPAM driver is not unloadable anyway...)
Nit: redundant &
(You don't have it in the similar call in mpam_enable_once().)
>
> if (err && msc)
> mpam_msc_drv_remove(pdev);
> @@ -795,6 +898,41 @@ static struct platform_driver mpam_msc_driver = {
> .remove = mpam_msc_drv_remove,
> };
>
> +static void mpam_enable_once(void)
> +{
> + mpam_register_cpuhp_callbacks(mpam_cpu_online, mpam_cpu_offline);
Should it be fatal if this fails?
> +
> + pr_info("MPAM enabled\n");
> +}
> +
> +/*
> + * Enable mpam once all devices have been probed.
> + * Scheduled by mpam_discovery_cpu_online() once all devices have been created.
> + * Also scheduled when new devices are probed when new CPUs come online.
> + */
> +void mpam_enable(struct work_struct *work)
> +{
> + static atomic_t once;
Nit: possibly unnecessary atomic_t? This is slow-path code, and we
already have to take mpam_list_lock. Harmless, though.
> + struct mpam_msc *msc;
> + bool all_devices_probed = true;
> +
> + /* Have we probed all the hw devices? */
> + mutex_lock(&mpam_list_lock);
> + list_for_each_entry(msc, &mpam_all_msc, glbl_list) {
> + mutex_lock(&msc->probe_lock);
> + if (!msc->probed)
> + all_devices_probed = false;
> + mutex_unlock(&msc->probe_lock);
> +
> + if (!all_devices_probed)
> + break;
WARN()?
We counted the MSCs in via the mpam_discovery_cpu_online(), so I think
we shouldn't get in here if some failed to probe?
> + }
> + mutex_unlock(&mpam_list_lock);
> +
> + if (all_devices_probed && !atomic_fetch_inc(&once))
> + mpam_enable_once();
> +}
[...]
Cheers
---Dave
Powered by blists - more mailing lists