linux-kernel - Re: [PATCH v7 14/24] x86/resctrl: Allow resctrl_arch_rmid

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cf4fa9e8-b810-35c8-d383-4ed345e0ce76@arm.com>
Date:   Thu, 14 Dec 2023 11:37:52 +0000
From:   James Morse <james.morse@....com>
To:     babu.moger@....com, x86@...nel.org, linux-kernel@...r.kernel.org
Cc:     Fenghua Yu <fenghua.yu@...el.com>,
        Reinette Chatre <reinette.chatre@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        H Peter Anvin <hpa@...or.com>,
        shameerali.kolothum.thodi@...wei.com,
        D Scott Phillips OS <scott@...amperecomputing.com>,
        carl@...amperecomputing.com, lcherian@...vell.com,
        bobo.shaobowang@...wei.com, tan.shaopeng@...itsu.com,
        baolin.wang@...ux.alibaba.com, Jamie Iles <quic_jiles@...cinc.com>,
        Xin Hao <xhao@...ux.alibaba.com>, peternewman@...gle.com,
        dfustini@...libre.com, amitsinght@...vell.com
Subject: Re: [PATCH v7 14/24] x86/resctrl: Allow resctrl_arch_rmid_read() to
 sleep

Hi Babu,

On 09/11/2023 20:42, Moger, Babu wrote:
> On 10/25/23 13:03, James Morse wrote:
>> MPAM's cache occupancy counters can take a little while to settle once
>> the monitor has been configured. The maximum settling time is described
>> to the driver via a firmware table. The value could be large enough
>> that it makes sense to sleep. To avoid exposing this to resctrl, it
>> should be hidden behind MPAM's resctrl_arch_rmid_read().
>>
>> resctrl_arch_rmid_read() may be called via IPI meaning it is unable
>> to sleep. In this case resctrl_arch_rmid_read() should return an error
>> if it needs to sleep. This will only affect MPAM platforms where
>> the cache occupancy counter isn't available immediately, nohz_full is
>> in use, and there are no housekeeping CPUs in the necessary domain.
>>
>> There are three callers of resctrl_arch_rmid_read():
>> __mon_event_count() and __check_limbo() are both called from a
>> non-migrateable context. mon_event_read() invokes __mon_event_count()
>> using smp_call_on_cpu(), which adds work to the target CPUs workqueue.
>> rdtgroup_mutex() is held, meaning this cannot race with the resctrl
>> cpuhp callback. __check_limbo() is invoked via schedule_delayed_work_on()
>> also adds work to a per-cpu workqueue.
>>
>> The remaining call is add_rmid_to_limbo() which is called in response
>> to a user-space syscall that frees an RMID. This opportunistically
>> reads the LLC occupancy counter on the current domain to see if the
>> RMID is over the dirty threshold. This has to disable preemption to
>> avoid reading the wrong domain's value. Disabling pre-emption here
>> prevents resctrl_arch_rmid_read() from sleeping.

> I dont know what did you mean by "This has to disable preemption to
> avoid reading the wrong domain's value."

Pre-emption lets this thread be scheduled out, and potentially scheduled back in on a
different CPU, possibly in a different domain. Any code with the concept of 'this domain'
has to to ensure it can't be migrated. Disabling pre-emption is the most common way of
doing that.

Disabling pre-emption also prevents the thread from sleeping, because it can't be
scheduled out.


> Who is disabling the preemption here? Is that specific to ARM?
> Can you please make that clear? Or Am i missing something?

add_rmid_to_limbo() is calling get_cpu(), which raises the pre-empt counter.
If it only wanted the CPU number it could have just called smp_processor_id() - but that
wouldn't be safe because the thread can be migrated, meaning the cpu number can change.

All this is to ensure that cpumask_test_cpu() and resctrl_arch_rmid_read() run on the same
CPU.


Thanks,

James

>> add_rmid_to_limbo() walks each domain, but only reads the counter
>> on one domain. If the system has more than one domain, the RMID will
>> always be added to the limbo list. If the RMIDs usage was not over the
>> threshold, it will be removed from the list when __check_limbo() runs.
>> Make this the default behaviour. Free RMIDs are always added to the
>> limbo list for each domain.
>>
>> The user visible effect of this is that a clean RMID is not available
>> for re-allocation immediately after 'rmdir()' completes, this behaviour
>> was never portable as it never happened on a machine with multiple
>> domains.
>>
>> Removing this path allows resctrl_arch_rmid_read() to sleep if its called
>> with interrupts unmasked. Document this is the expected behaviour, and
>> add a might_sleep() annotation to catch changes that won't work on arm64.


>> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
>> index fa3319021881..409817b0ae2c 100644
>> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
>> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
>> @@ -464,17 +464,7 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
>>  	idx = resctrl_arch_rmid_idx_encode(entry->closid, entry->rmid);
>>  
>>  	entry->busy = 0;
>> -	cpu = get_cpu();
>>  	list_for_each_entry(d, &r->domains, list) {
>> -		if (cpumask_test_cpu(cpu, &d->cpu_mask)) {
>> -			err = resctrl_arch_rmid_read(r, d, entry->closid,
>> -						     entry->rmid,
>> -						     QOS_L3_OCCUP_EVENT_ID,
>> -						     &val);
>> -			if (err || val <= resctrl_rmid_realloc_threshold)
>> -				continue;
>> -		}
>> -
>>  		/*
>>  		 * For the first limbo RMID in the domain,
>>  		 * setup up the limbo worker.
>> @@ -484,15 +474,10 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
>>  		set_bit(idx, d->rmid_busy_llc);
>>  		entry->busy++;
>>  	}
>> -	put_cpu();
>>  
>> -	if (entry->busy) {
>> -		rmid_limbo_count++;
>> -		if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
>> -			closid_num_dirty_rmid[entry->closid]++;
>> -	} else {
>> -		list_add_tail(&entry->list, &rmid_free_lru);
>> -	}
>> +	rmid_limbo_count++;
>> +	if (IS_ENABLED(CONFIG_RESCTRL_RMID_DEPENDS_ON_CLOSID))
>> +		closid_num_dirty_rmid[entry->closid]++;
>>  }
>>