[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7c9b0656-fb09-1adf-664b-e2cbcc04b1db@arm.com>
Date: Thu, 14 Dec 2023 11:37:27 +0000
From: James Morse <james.morse@....com>
To: babu.moger@....com, x86@...nel.org, linux-kernel@...r.kernel.org
Cc: Fenghua Yu <fenghua.yu@...el.com>,
Reinette Chatre <reinette.chatre@...el.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
H Peter Anvin <hpa@...or.com>,
shameerali.kolothum.thodi@...wei.com,
D Scott Phillips OS <scott@...amperecomputing.com>,
carl@...amperecomputing.com, lcherian@...vell.com,
bobo.shaobowang@...wei.com, tan.shaopeng@...itsu.com,
baolin.wang@...ux.alibaba.com, Jamie Iles <quic_jiles@...cinc.com>,
Xin Hao <xhao@...ux.alibaba.com>, peternewman@...gle.com,
dfustini@...libre.com, amitsinght@...vell.com
Subject: Re: [PATCH v7 13/24] x86/resctrl: Queue mon_event_read() instead of
sending an IPI
Hi Babu,
On 09/11/2023 20:40, Moger, Babu wrote:
> On 10/25/23 13:03, James Morse wrote:
>> Intel is blessed with an abundance of monitors, one per RMID, that can be
>> read from any CPU in the domain. MPAMs monitors reside in the MMIO MSC,
>> the number implemented is up to the manufacturer. This means when there are
>> fewer monitors than needed, they need to be allocated and freed.
>>
>> MPAM's CSU monitors are used to back the 'llc_occupancy' monitor file. The
>> CSU counter is allowed to return 'not ready' for a small number of
>> micro-seconds after programming. To allow one CSU hardware monitor to be
>> used for multiple control or monitor groups, the CPU accessing the
>> monitor needs to be able to block when configuring and reading the
>> counter.
>>
>> Worse, the domain may be broken up into slices, and the MMIO accesses
>> for each slice may need performing from different CPUs.
>>
>> These two details mean MPAMs monitor code needs to be able to sleep, and
>> IPI another CPU in the domain to read from a resource that has been sliced.
>>
>> mon_event_read() already invokes mon_event_count() via IPI, which means
>> this isn't possible. On systems using nohz-full, some CPUs need to be
>> interrupted to run kernel work as they otherwise stay in user-space
>> running realtime workloads. Interrupting these CPUs should be avoided,
>> and scheduling work on them may never complete.
>>
>> Change mon_event_read() to pick a housekeeping CPU, (one that is not using
>> nohz_full) and schedule mon_event_count() and wait. If all the CPUs
>> in a domain are using nohz-full, then an IPI is used as the fallback.
>>
>> This function is only used in response to a user-space filesystem request
>> (not the timing sensitive overflow code).
>>
>> This allows MPAM to hide the slice behaviour from resctrl, and to keep
>> the monitor-allocation in monitor.c. When the IPI fallback is used on
>> machines where MPAM needs to make an access on multiple CPUs, the counter
>> read will always fail.
>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> index beccb0e87ba7..d07f99245851 100644
>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> @@ -522,12 +524,21 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
>> return ret;
>> }
>>
>> +static int smp_mon_event_count(void *arg)
>> +{
>> + mon_event_count(arg);
>> +
>> + return 0;
>> +}
>
> Shouldn't this function defined as "void" similar to mon_event_count?
> Return code is not used anywhere.
smp_call_on_cpu() requires it to return an int, even if the value is not used.
This wrapper only exists because smp_call_on_cpu() takes a different prototype to
smp_call_function_any().
Thanks,
James
>> @@ -536,7 +547,18 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>> rr->val = 0;
>> rr->first = first;
>>
>> - smp_call_function_any(&d->cpu_mask, mon_event_count, rr, 1);
>> + cpu = cpumask_any_housekeeping(&d->cpu_mask);
>> +
>> + /*
>> + * cpumask_any_housekeeping() prefers housekeeping CPUs, but
>> + * are all the CPUs nohz_full? If yes, pick a CPU to IPI.
>> + * MPAM's resctrl_arch_rmid_read() is unable to read the
>> + * counters on some platforms if its called in irq context.
>> + */
>> + if (tick_nohz_full_cpu(cpu))
>> + smp_call_function_any(&d->cpu_mask, mon_event_count, rr, 1);
>> + else
>> + smp_call_on_cpu(cpu, smp_mon_event_count, rr, false);
>> }
>>
Powered by blists - more mailing lists