linux-kernel - Re: [Patch v4 1/3] lib: Restrict cpumask_local

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87b81c08-878f-8394-be94-f5b99ee0be5b@arm.com>
Date:   Wed, 27 Jan 2021 13:49:48 +0000
From:   Robin Murphy <robin.murphy@....com>
To:     Marcelo Tosatti <mtosatti@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>
Cc:     Nitesh Narayan Lal <nitesh@...hat.com>,
        linux-kernel@...r.kernel.org, linux-api@...r.kernel.org,
        frederic@...nel.org, juri.lelli@...hat.com, abelits@...vell.com,
        bhelgaas@...gle.com, linux-pci@...r.kernel.org,
        rostedt@...dmis.org, mingo@...nel.org, peterz@...radead.org,
        davem@...emloft.net, akpm@...ux-foundation.org,
        sfr@...b.auug.org.au, stephen@...workplumber.org,
        rppt@...ux.vnet.ibm.com, jinyuqi@...wei.com,
        zhangshaokun@...ilicon.com
Subject: Re: [Patch v4 1/3] lib: Restrict cpumask_local_spread to houskeeping
 CPUs

On 2021-01-27 13:09, Marcelo Tosatti wrote:
> On Wed, Jan 27, 2021 at 12:36:30PM +0000, Robin Murphy wrote:
>> On 2021-01-27 12:19, Marcelo Tosatti wrote:
>>> On Wed, Jan 27, 2021 at 11:57:16AM +0000, Robin Murphy wrote:
>>>> Hi,
>>>>
>>>> On 2020-06-25 23:34, Nitesh Narayan Lal wrote:
>>>>> From: Alex Belits <abelits@...vell.com>
>>>>>
>>>>> The current implementation of cpumask_local_spread() does not respect the
>>>>> isolated CPUs, i.e., even if a CPU has been isolated for Real-Time task,
>>>>> it will return it to the caller for pinning of its IRQ threads. Having
>>>>> these unwanted IRQ threads on an isolated CPU adds up to a latency
>>>>> overhead.
>>>>>
>>>>> Restrict the CPUs that are returned for spreading IRQs only to the
>>>>> available housekeeping CPUs.
>>>>>
>>>>> Signed-off-by: Alex Belits <abelits@...vell.com>
>>>>> Signed-off-by: Nitesh Narayan Lal <nitesh@...hat.com>
>>>>> ---
>>>>>     lib/cpumask.c | 16 +++++++++++-----
>>>>>     1 file changed, 11 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/lib/cpumask.c b/lib/cpumask.c
>>>>> index fb22fb266f93..85da6ab4fbb5 100644
>>>>> --- a/lib/cpumask.c
>>>>> +++ b/lib/cpumask.c
>>>>> @@ -6,6 +6,7 @@
>>>>>     #include <linux/export.h>
>>>>>     #include <linux/memblock.h>
>>>>>     #include <linux/numa.h>
>>>>> +#include <linux/sched/isolation.h>
>>>>>     /**
>>>>>      * cpumask_next - get the next cpu in a cpumask
>>>>> @@ -205,22 +206,27 @@ void __init free_bootmem_cpumask_var(cpumask_var_t mask)
>>>>>      */
>>>>>     unsigned int cpumask_local_spread(unsigned int i, int node)
>>>>>     {
>>>>> -	int cpu;
>>>>> +	int cpu, hk_flags;
>>>>> +	const struct cpumask *mask;
>>>>> +	hk_flags = HK_FLAG_DOMAIN | HK_FLAG_MANAGED_IRQ;
>>>>> +	mask = housekeeping_cpumask(hk_flags);
>>>>
>>>> AFAICS, this generally resolves to something based on cpu_possible_mask
>>>> rather than cpu_online_mask as before, so could now potentially return an
>>>> offline CPU. Was that an intentional change?
>>>
>>> Robin,
>>>
>>> AFAICS online CPUs should be filtered.
>>
>> Apologies if I'm being thick, but can you explain how? In the case of
>> isolation being disabled or compiled out, housekeeping_cpumask() is
>> literally just "return cpu_possible_mask;". If we then iterate over that
>> with for_each_cpu() and just return the i'th possible CPU (e.g. in the
>> NUMA_NO_NODE case), what guarantees that CPU is actually online?
>>
>> Robin.
> 
> Nothing, but that was the situation before 1abdfe706a579a702799fce465bceb9fb01d407c
> as well.

True, if someone calls from a racy context then there's not much we can 
do to ensure that any CPU *remains* online after we initially observed 
it to be, but when it's called from somewhere safe like a cpuhp offline 
handler, then picking from cpu_online_mask *did* always do the right 
thing (by my interpretation), whereas picking from 
housekeeping_cpumask() might not.

This is why I decided to ask rather than just send a patch to fix what I 
think might be a bug - I have no objection if this *is* intended 
behaviour, other than suggesting we amend the "...selects an online 
CPU..." comment if that aspect was never meant to be relied upon.

Thanks,
Robin.

> 
> cpumask_local_spread() should probably be disabling CPU hotplug.
> 
> Thomas?
> 
>>
>>>> I was just looking at the current code since I had the rare presence of mind
>>>> to check if something suitable already existed before I start open-coding
>>>> "any online CPU, but local node preferred" logic for handling IRQ affinity
>>>> in a driver - cpumask_local_spread() appears to be almost what I want (if a
>>>> bit more heavyweight), if only it would actually guarantee an online CPU as
>>>> the kerneldoc claims :(
>>>>
>>>> Robin.
>>>>
>>>>>     	/* Wrap: we always want a cpu. */
>>>>> -	i %= num_online_cpus();
>>>>> +	i %= cpumask_weight(mask);
>>>>>     	if (node == NUMA_NO_NODE) {
>>>>> -		for_each_cpu(cpu, cpu_online_mask)
>>>>> +		for_each_cpu(cpu, mask) {
>>>>>     			if (i-- == 0)
>>>>>     				return cpu;
>>>>> +		}
>>>>>     	} else {
>>>>>     		/* NUMA first. */
>>>>> -		for_each_cpu_and(cpu, cpumask_of_node(node), cpu_online_mask)
>>>>> +		for_each_cpu_and(cpu, cpumask_of_node(node), mask) {
>>>>>     			if (i-- == 0)
>>>>>     				return cpu;
>>>>> +		}
>>>>> -		for_each_cpu(cpu, cpu_online_mask) {
>>>>> +		for_each_cpu(cpu, mask) {
>>>>>     			/* Skip NUMA nodes, done above. */
>>>>>     			if (cpumask_test_cpu(cpu, cpumask_of_node(node)))
>>>>>     				continue;
>>>>>
>>>
>