lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <348c5e26-0ee0-36fd-893b-4ff9fcae67c1@arm.com>
Date:   Thu, 27 Apr 2023 15:20:07 +0100
From:   James Morse <james.morse@....com>
To:     Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
Cc:     x86@...nel.org, LKML <linux-kernel@...r.kernel.org>,
        Fenghua Yu <fenghua.yu@...el.com>,
        Reinette Chatre <reinette.chatre@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        H Peter Anvin <hpa@...or.com>,
        Babu Moger <Babu.Moger@....com>,
        shameerali.kolothum.thodi@...wei.com,
        D Scott Phillips OS <scott@...amperecomputing.com>,
        carl@...amperecomputing.com, lcherian@...vell.com,
        bobo.shaobowang@...wei.com, tan.shaopeng@...itsu.com,
        xingxin.hx@...nanolis.org, baolin.wang@...ux.alibaba.com,
        Jamie Iles <quic_jiles@...cinc.com>,
        Xin Hao <xhao@...ux.alibaba.com>, peternewman@...gle.com
Subject: Re: [PATCH v3 17/19] x86/resctrl: Allow overflow/limbo handlers to be
 scheduled on any-but cpu

Hi Ilpo,

On 21/03/2023 15:25, Ilpo Järvinen wrote:
> On Tue, 21 Mar 2023, Ilpo Jï¿œrvinen wrote:
>> On Mon, 20 Mar 2023, James Morse wrote:
>>
>>> When a CPU is taken offline resctrl may need to move the overflow or
>>> limbo handlers to run on a different CPU.
>>>
>>> Once the offline callbacks have been split, cqm_setup_limbo_handler()
>>> will be called while the CPU that is going offline is still present
>>> in the cpu_mask.
>>>
>>> Pass the CPU to exclude to cqm_setup_limbo_handler() and
>>> mbm_setup_overflow_handler(). These functions can use a variant of
>>> cpumask_any_but() when selecting the CPU. -1 is used to indicate no CPUs
>>> need excluding.

>>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>>> index 3eb5b307b809..47838ba6876e 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>>> @@ -78,6 +78,37 @@ static inline unsigned int cpumask_any_housekeeping(const struct cpumask *mask)
>>>  	return cpu;
>>>  }
>>>  
>>> +/**
>>> + * cpumask_any_housekeeping_but() - Chose any cpu in @mask, preferring those
>>> + *			            that aren't marked nohz_full, excluding
>>> + *				    the provided CPU
>>> + * @mask:	The mask to pick a CPU from.
>>> + * @exclude_cpu:The CPU to avoid picking.
>>> + *
>>> + * Returns a CPU from @mask, but not @but. If there are houskeeping CPUs that
>>> + * don't use nohz_full, these are preferred.
>>> + * Returns >= nr_cpu_ids if no CPUs are available.
>>> + */
>>> +static inline unsigned int
>>> +cpumask_any_housekeeping_but(const struct cpumask *mask, int exclude_cpu)
>>> +{
>>> +	int cpu, hk_cpu;
>>> +
>>> +	cpu = cpumask_any_but(mask, exclude_cpu);
>>> +	if (tick_nohz_full_cpu(cpu)) {
>>> +		hk_cpu = cpumask_nth_andnot(0, mask, tick_nohz_full_mask);
>>> +		if  (hk_cpu == exclude_cpu) {
>>> +			hk_cpu = cpumask_nth_andnot(1, mask,
>>> +						    tick_nohz_full_mask);

>> I'm left to wonder if it's okay to alter tick_nohz_full_mask in resctrl 
>> code??

Why do you think cpumask_nth_andnot() modifies its arguments?

The cpumask arguments to cpumask_nth_andnot() are const.


> I suppose it should do instead:
> 		hk_cpu = cpumask_nth_and(0, mask, tick_nohz_full_mask);
> 		if (hk_cpu == exclude_cpu)
> 			hk_cpu = cpumask_next_and(hk_cpu, mask, tick_nohz_full_mask);
> 

Removing the 'not' changes the behaviour. hk_cpu is now guaranteed to be a nohz_full CPU.
This needs to prefer CPUs that are not in that mask.

Passing 'hk_cpu' the second time doesn't look right, hk_cpu is a CPU-number, not a count
of the 'nth CPU to find', which is what the argument expects.
For example: If the mask only has CPU 10-12, where CPU 10 should be excluded, its possible
the first attempt for the 0th CPU returns 10... in which case I want to pass '1' now I
know that the 0th is the excluded CPU. If I pass 10 I expect an error, as there aren't 10
bits set in the mask.


Thanks,

James

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ