[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <8c75d1a5-42f8-4adf-a1b1-74aa668b1a30@arm.com>
Date: Mon, 13 Oct 2025 17:29:13 +0100
From: James Morse <james.morse@....com>
To: Ben Horgan <ben.horgan@....com>, linux-kernel@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, linux-acpi@...r.kernel.org
Cc: D Scott Phillips OS <scott@...amperecomputing.com>,
carl@...amperecomputing.com, lcherian@...vell.com,
bobo.shaobowang@...wei.com, tan.shaopeng@...itsu.com,
baolin.wang@...ux.alibaba.com, Jamie Iles <quic_jiles@...cinc.com>,
Xin Hao <xhao@...ux.alibaba.com>, peternewman@...gle.com,
dfustini@...libre.com, amitsinght@...vell.com,
David Hildenbrand <david@...hat.com>, Dave Martin <dave.martin@....com>,
Koba Ko <kobak@...dia.com>, Shanker Donthineni <sdonthineni@...dia.com>,
fenghuay@...dia.com, baisheng.gao@...soc.com,
Jonathan Cameron <jonathan.cameron@...wei.com>, Rob Herring
<robh@...nel.org>, Rohit Mathew <rohit.mathew@....com>,
Rafael Wysocki <rafael@...nel.org>, Len Brown <lenb@...nel.org>,
Lorenzo Pieralisi <lpieralisi@...nel.org>, Hanjun Guo
<guohanjun@...wei.com>, Sudeep Holla <sudeep.holla@....com>,
Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Danilo Krummrich <dakr@...nel.org>
Subject: Re: [PATCH v2 24/29] arm_mpam: Track bandwidth counter state for
overflow and power management
Hi Ben,
On 12/09/2025 16:55, Ben Horgan wrote:
> On 9/10/25 21:43, James Morse wrote:
>> Bandwidth counters need to run continuously to correctly reflect the
>> bandwidth.
>>
>> The value read may be lower than the previous value read in the case
>> of overflow and when the hardware is reset due to CPU hotplug.
>>
>> Add struct mbwu_state to track the bandwidth counter to allow overflow
>> and power management to be handled.
>> diff --git a/drivers/resctrl/mpam_devices.c b/drivers/resctrl/mpam_devices.c
>> index 1543c33c5d6a..eeb62ed94520 100644
>> --- a/drivers/resctrl/mpam_devices.c
>> +++ b/drivers/resctrl/mpam_devices.c
>> @@ -990,20 +992,32 @@ static void write_msmon_ctl_flt_vals(struct mon_read *m, u32 ctl_val,
>> mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val);
>> mpam_write_monsel_reg(msc, MBWU, 0);
>> mpam_write_monsel_reg(msc, CFG_MBWU_CTL, ctl_val | MSMON_CFG_x_CTL_EN);
>> +
>> + mbwu_state = &m->ris->mbwu_state[m->ctx->mon];
>> + if (mbwu_state)
>> + mbwu_state->prev_val = 0;
> What's the if condition doing here?
Yes, that looks like cruft....
It took the address of an array element - how could it be null?!
> The below could make more sense but I don't think you can get here if
> the allocation fails.
Heh ... only because __allocate_component_cfg() has lost the error value.
Without the outer/inner locking stuff, its feasible for __allocate_component_cfg() to
return the error value directly.
With that fixed, and ignoring a bogus ctx->mon value - I agree you can't get a case where
this needs checking.
I think this was originally testing if the array had been allocated, and its been folded
wrongly at some point in the past. I assume I kept those bogus tests around as I saw it
blow up with nonsense num_mbwu_mon - which is something I'll retest.
>> +
>> break;
>> default:
>> return;
>> }
>> }
>> @@ -2106,6 +2227,35 @@ static int __allocate_component_cfg(struct mpam_component *comp)
>> return -ENOMEM;
>> init_garbage(comp->cfg);
>>
>> + list_for_each_entry(vmsc, &comp->vmsc, comp_list) {
>> + if (!vmsc->props.num_mbwu_mon)
>> + continue;
>> +
>> + msc = vmsc->msc;
>> + list_for_each_entry(ris, &vmsc->ris, vmsc_list) {
>> + if (!ris->props.num_mbwu_mon)
>> + continue;
>> +
>> + mbwu_state = kcalloc(ris->props.num_mbwu_mon,
>> + sizeof(*ris->mbwu_state),
>> + GFP_KERNEL);
>> + if (!mbwu_state) {
>> + __destroy_component_cfg(comp);
>> + err = -ENOMEM;
>> + break;
>> + }
>> +
>> + if (mpam_mon_sel_lock(msc)) {
>> + init_garbage(mbwu_state);
>> + ris->mbwu_state = mbwu_state;
>> + mpam_mon_sel_unlock(msc);
>> + }
>
> The if statement is confusing now that mpam_mon_sel_lock()
> unconditionally returns true.
Sure, but this and the __must_check means all the paths that use this must be able to
return an error.
This is a churn-or-not trade-off for the inclusion of the firmware-backed support.
I'd prefer it to be hard to add code-paths that are going to create a lot of work when
that comes - especially as folk are promising platforms that need this in the coming months.
Thanks,
James
Powered by blists - more mailing lists