linux-kernel - Re: [PATCH] x86/resctrl: Fix mbm_setup_overflow

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <e1fe68a5-d088-4e4d-bac4-bbb7793dab6d@intel.com>
Date: Thu, 28 Mar 2024 14:11:01 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: "Luck, Tony" <tony.luck@...el.com>, Borislav Petkov <bp@...en8.de>
CC: "Yu, Fenghua" <fenghua.yu@...el.com>, "Wieczor-Retman, Maciej"
	<maciej.wieczor-retman@...el.com>, Peter Newman <peternewman@...gle.com>,
	James Morse <james.morse@....com>, Babu Moger <babu.moger@....com>, "Drew
 Fustini" <dfustini@...libre.com>, "x86@...nel.org" <x86@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"patches@...ts.linux.dev" <patches@...ts.linux.dev>
Subject: Re: [PATCH] x86/resctrl: Fix mbm_setup_overflow_handler() when last
 CPU goes offline

Hi Tony,

On 3/27/2024 4:01 PM, Luck, Tony wrote:
>> There seems to be two issues here (although I am not familiar with these flows). First,
>> it seems that tick_nohz_full_mask is not actually allocated unless the user boots
>> with a "nohz_full=". This means that any attempt to access bits within tick_nohz_full_mask
>> will cause this OOPS. If that is allocated then the second issue seems that the  
>> buried __ffs() call requires that it not be called with 0 and this checking is not done.
>>
>> To me it seems most appropriate to fix this at the central place to ensure all scenarios
>> are handled instead of scattering checks.
> 
> Good analysis.
> 
>> To that end, what do you think of something like below? It uses tick_nohz_full_enabled() check
>> to ensure that tick_nohz_full_mask is actually allocated while the other changes aim to
>> avoid __ffs() on 0.

I studied the flows some more and I no longer believe that there is a risk of __ffs() being
called on 0. Looking at the flows starting with cpumask_nth_andnot() the value provided to
__ffs() is ensured to be non-zero via either an explicit check or hweight_long().

To confirm this I tested a CONFIG_NO_HZ_FULL kernel by booting with "nohz_full=" set to the
highest numbered CPU in the domain. When all the CPUs are offlined the behavior for the
"second to last" and "last" CPU are most interesting and worked as expected when using 
cpumask_nth_andnot(). 

> 
> Looks good.
> 
> Tested-by: Tony Luck <tony.luck@...el.com>
> Reviewed-by: Tont Luck <tony.luck@...el.com>

Considering the motivation above I will submit a change that only adds the 
tick_nohz_full_enabled() check. Since it is such a big change from the snippet you reviewed
and tested here I dropped your tags from it and hope you can reconsider the fix after considering
the information above.

Reinette