[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cf4fa9e8-b810-35c8-d383-4ed345e0ce76@arm.com>
Date: Thu, 14 Dec 2023 11:37:52 +0000
From: James Morse <james.morse@....com>
To: babu.moger@....com, x86@...nel.org, linux-kernel@...r.kernel.org
Cc: Fenghua Yu <fenghua.yu@...el.com>,
Reinette Chatre <reinette.chatre@...el.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
H Peter Anvin <hpa@...or.com>,
shameerali.kolothum.thodi@...wei.com,
D Scott Phillips OS <scott@...amperecomputing.com>,
carl@...amperecomputing.com, lcherian@...vell.com,
bobo.shaobowang@...wei.com, tan.shaopeng@...itsu.com,
baolin.wang@...ux.alibaba.com, Jamie Iles <quic_jiles@...cinc.com>,
Xin Hao <xhao@...ux.alibaba.com>, peternewman@...gle.com,
dfustini@...libre.com, amitsinght@...vell.com
Subject: Re: [PATCH v7 14/24] x86/resctrl: Allow resctrl_arch_rmid_read() to
sleep
Hi Babu,
On 09/11/2023 20:42, Moger, Babu wrote:
> On 10/25/23 13:03, James Morse wrote:
>> MPAM's cache occupancy counters can take a little while to settle once
>> the monitor has been configured. The maximum settling time is described
>> to the driver via a firmware table. The value could be large enough
>> that it makes sense to sleep. To avoid exposing this to resctrl, it
>> should be hidden behind MPAM's resctrl_arch_rmid_read().
>>
>> resctrl_arch_rmid_read() may be called via IPI meaning it is unable
>> to sleep. In this case resctrl_arch_rmid_read() should return an error
>> if it needs to sleep. This will only affect MPAM platforms where
>> the cache occupancy counter isn't available immediately, nohz_full is
>> in use, and there are no housekeeping CPUs in the necessary domain.
>>
>> There are three callers of resctrl_arch_rmid_read():
>> __mon_event_count() and __check_limbo() are both called from a
>> non-migrateable context. mon_event_read() invokes __mon_event_count()
>> using smp_call_on_cpu(), which adds work to the target CPUs workqueue.
>> rdtgroup_mutex() is held, meaning this cannot race with the resctrl
>> cpuhp callback. __check_limbo() is invoked via schedule_delayed_work_on()
>> also adds work to a per-cpu workqueue.
>>
>> The remaining call is add_rmid_to_limbo() which is called in response
>> to a user-space syscall that frees an RMID. This opportunistically
>> reads the LLC occupancy counter on the current domain to see if the
>> RMID is over the dirty threshold. This has to disable preemption to
>> avoid reading the wrong domain's value. Disabling pre-emption here
>> prevents resctrl_arch_rmid_read() from sleeping.
> I dont know what did you mean by "This has to disable preemption to
> avoid reading the wrong domain's value."
Pre-emption lets this thread be scheduled out, and potentially scheduled back in on a
different CPU, possibly in a different domain. Any code with the concept of 'this domain'
has to to ensure it can't be migrated. Disabling pre-emption is the most common way of
doing that.
Disabling pre-emption also prevents the thread from sleeping, because it can't be
scheduled out.
> Who is disabling the preemption here? Is that specific to ARM?
> Can you please make that clear? Or Am i missing something?
add_rmid_to_limbo() is calling get_cpu(), which raises the pre-empt counter.
If it only wanted the CPU number it could have just called smp_processor_id() - but that
wouldn't be safe because the thread can be migrated, meaning the cpu number can change.
All this is to ensure that cpumask_test_cpu() and resctrl_arch_rmid_read() run on the same
CPU.
Thanks,
James
>> add_rmid_to_limbo() walks each domain, but only reads the counter
>> on one domain. If the system has more than one domain, the RMID will
>> always be added to the limbo list. If the RMIDs usage was not over the
>> threshold, it will be removed from the list when __check_limbo() runs.
>> Make this the default behaviour. Free RMIDs are always added to the
>> limbo list for each domain.
>>
>> The user visible effect of this is that a clean RMID is not available
>> for re-allocation immediately after 'rmdir()' completes, this behaviour
>> was never portable as it never happened on a machine with multiple
>> domains.
>>
>> Removing this path allows resctrl_arch_rmid_read() to sleep if its called
>> with interrupts unmasked. Document this is the expected behaviour, and
>> add a might_sleep() annotation to catch changes that won't work on arm64.
>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index fa3319021881..409817b0ae2c 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -464,17 +464,7 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
>> idx = resctrl_arch_rmid_idx_encode(entry->closid, entry->rmid);
>>
>> entry->busy = 0;
>> - cpu = get_cpu();
>> list_for_each_entry(d, &r->domains, list) {
>> - if (cpumask_test_cpu(cpu, &d->cpu_mask)) {
>> - err = resctrl_arch_rmid_read(r, d, entry->closid,
>> - entry->rmid,
>> - QOS_L3_OCCUP_EVENT_ID,
>> - &val);
>> - if (err || val <= resctrl_rmid_realloc_threshold)
>> - continue;
>> - }
>> -
>> /*
>> * For the first limbo RMID in the domain,
>> * setup up the limbo worker.
>> @@ -484,15 +474,10 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
>> set_bit(idx, d->rmid_busy_llc);
>> entry->busy++;
>> }
>> - put_cpu();
>>
>> - if (entry->busy) {
>> - rmid_limbo_count++;
>> - if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
>> - closid_num_dirty_rmid[entry->closid]++;
>> - } else {
>> - list_add_tail(&entry->list, &rmid_free_lru);
>> - }
>> + rmid_limbo_count++;
>> + if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
>> + closid_num_dirty_rmid[entry->closid]++;
>> }
>>
Powered by blists - more mailing lists