linux-kernel - Re: [PATCH v2] x86/nmi: Add an emergency handler in nmi_desc & use it in nmi_shootdown

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <55b5441a-6323-4ec7-aafc-f00af7e85707@redhat.com>
Date: Thu, 5 Dec 2024 23:16:07 -0500
From: Waiman Long <llong@...hat.com>
To: Thomas Gleixner <tglx@...utronix.de>, Waiman Long <llong@...hat.com>,
 Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
 Dave Hansen <dave.hansen@...ux.intel.com>,
 Peter Zijlstra <peterz@...radead.org>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org,
 "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH v2] x86/nmi: Add an emergency handler in nmi_desc & use it
 in nmi_shootdown_cpus()


On 12/5/24 1:17 PM, Thomas Gleixner wrote:
> On Thu, Dec 05 2024 at 08:22, Waiman Long wrote:
>> On 12/5/24 8:12 AM, Thomas Gleixner wrote:
>>>> Actually, crash_nmi_callback() can return in the case of the crashing
>>>> CPUs, though all the other CPUs will not return once called. So I
>>>> believe the current form is correct. I will update the comment to
>>>> reflect that.
>>> Why would you continue servicing the NMI on a CPU which just crashed?
>> According to crash_nmi_callback(),
>>
>>           /*
>>            * Don't do anything if this handler is invoked on crashing cpu.
>>            * Otherwise, system will completely hang. Crashing cpu can get
>>            * an NMI if system was initially booted with nmi_watchdog
>> parameter.
>>            */
>>           if (cpu == crashing_cpu)
>>                   return NMI_HANDLED;
>>
>> The crashing CPU still has work to do after shutting down other CPUs. It
>> can't wait there forever without completing other crashing actions. The
>> only thing I can see we can do is to return immediately without
>> servicing other less important nmi handlers in the list.
> I understand that, but in case that the crashed CPU receives an NMI and
> sees that the emergency handler is set, shouldn't it stop the NMI
> processing instead of trying to go through perf and what not when the
> system is already in a fragile state. i.e.:
>
>         if (emergemcy_handler) {
>            emergency_handler();
>            return;
>         }

That is what I suggest in my last sentence. I will update the patch 
according.

Cheers,
Longman