lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7D571DAA-E399-4580-98B3-8A6E7085CB54@alien8.de>
Date: Thu, 29 Aug 2024 10:39:41 +0200
From: Borislav Petkov <bp@...en8.de>
To: Yazen Ghannam <yazen.ghannam@....com>
CC: Thomas Gleixner <tglx@...utronix.de>, linux-edac@...r.kernel.org,
 linux-kernel@...r.kernel.org, tony.luck@...el.com, x86@...nel.org,
 avadhut.naik@....com, john.allen@....com, boris.ostrovsky@...cle.com
Subject: Re: [PATCH] x86/MCE: Prevent CPU offline for SMCA CPUs with non-core banks

On August 27, 2024 3:47:06 PM GMT+02:00, Yazen Ghannam <yazen.ghannam@....com> wrote:
>On Tue, Aug 27, 2024 at 02:50:40PM +0200, Borislav Petkov wrote:
>> On August 26, 2024 3:20:57 PM GMT+02:00, Yazen Ghannam <yazen.ghannam@....com> wrote:
>> >On Sun, Aug 25, 2024 at 01:16:37PM +0200, Thomas Gleixner wrote:
>> >> On Wed, Aug 21 2024 at 09:00, Yazen Ghannam wrote:
>> >> > Logical CPUs in AMD Scalable MCA (SMCA) systems can manage non-core
>> >> > banks. Each of these banks represents unique and separate hardware
>> >> > located within the system. Each bank is managed by a single logical CPU;
>> >> > they are not shared. Furthermore, the "CPU to MCA bank" assignment
>> >> > cannot be modified at run time.
>> >> >
>> >> > The MCE subsystem supports run time CPU hotplug. Many vendors have
>> >> > non-core MCA banks, so MCA settings are not cleared when a CPU is
>> >> > offlined for these vendors.
>> >> >
>> >> > Even though the non-core MCA banks remain enabled, MCA errors will not
>> >> > be handled (reported, cleared, etc.) on SMCA systems when the managing
>> >> > CPU is offline.
>> >> >
>> >> > Check if a CPU manages non-core MCA banks and, if so, prevent it from
>> >> > being taken offline.
>> >> 
>> >> Which in turn breaks hibernation and kexec...
>> >>
>> >
>> >Right, good point.
>> >
>> >Maybe this change can apply only to a user-initiated (sysfs) case?
>> >
>> >Thanks,
>> >Yazen
>> >
>> 
>> Or, you can simply say that the MCE cannot be processed because the user took the managing CPU offline. 
>>
>
>I found that we can not populate the "cpuN/online" file. This would
>prevent a user from offlining a CPU, but it shouldn't prevent the system
>from doing what it needs.
>
>This is already done for CPU0, and other cases I think.
>
>> What is this actually really fixing anyway?
>
>There are times where a user wants to take CPUs offline due to software
>licensing. And this would prevent the user from unintentionally
>offlining CPUs that would affect MCA handling.
>
>Thanks,
>Yazen

If the user offlines CPUs and some MCEs cannot be handled as a result, then that's her/his problem, no?

- Why does it hurt when I do this? 
- Well, don't do that then.
-- 
Sent from a small device: formatting sucks and brevity is inevitable. 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ