lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Thu, 28 Mar 2024 14:11:01 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: "Luck, Tony" <tony.luck@...el.com>, Borislav Petkov <bp@...en8.de>
CC: "Yu, Fenghua" <fenghua.yu@...el.com>, "Wieczor-Retman, Maciej"
	<maciej.wieczor-retman@...el.com>, Peter Newman <peternewman@...gle.com>,
	James Morse <james.morse@....com>, Babu Moger <babu.moger@....com>, "Drew
 Fustini" <dfustini@...libre.com>, "x86@...nel.org" <x86@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"patches@...ts.linux.dev" <patches@...ts.linux.dev>
Subject: Re: [PATCH] x86/resctrl: Fix mbm_setup_overflow_handler() when last
 CPU goes offline

Hi Tony,

On 3/27/2024 4:01 PM, Luck, Tony wrote:
>> There seems to be two issues here (although I am not familiar with these flows). First,
>> it seems that tick_nohz_full_mask is not actually allocated unless the user boots
>> with a "nohz_full=". This means that any attempt to access bits within tick_nohz_full_mask
>> will cause this OOPS. If that is allocated then the second issue seems that the  
>> buried __ffs() call requires that it not be called with 0 and this checking is not done.
>>
>> To me it seems most appropriate to fix this at the central place to ensure all scenarios
>> are handled instead of scattering checks.
> 
> Good analysis.
> 
>> To that end, what do you think of something like below? It uses tick_nohz_full_enabled() check
>> to ensure that tick_nohz_full_mask is actually allocated while the other changes aim to
>> avoid __ffs() on 0.

I studied the flows some more and I no longer believe that there is a risk of __ffs() being
called on 0. Looking at the flows starting with cpumask_nth_andnot() the value provided to
__ffs() is ensured to be non-zero via either an explicit check or hweight_long().

To confirm this I tested a CONFIG_NO_HZ_FULL kernel by booting with "nohz_full=" set to the
highest numbered CPU in the domain. When all the CPUs are offlined the behavior for the
"second to last" and "last" CPU are most interesting and worked as expected when using 
cpumask_nth_andnot(). 

> 
> Looks good.
> 
> Tested-by: Tony Luck <tony.luck@...el.com>
> Reviewed-by: Tont Luck <tony.luck@...el.com>

Considering the motivation above I will submit a change that only adds the 
tick_nohz_full_enabled() check. Since it is such a big change from the snippet you reviewed
and tested here I dropped your tags from it and hope you can reconsider the fix after considering
the information above.

Reinette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ