lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ab20014e-ed45-0bdc-8ed3-2598244c6614@huawei.com>
Date:   Tue, 10 Jan 2023 22:06:57 +0800
From:   Zeng Heng <zengheng4@...wei.com>
To:     Ingo Molnar <mingo@...nel.org>
CC:     <michael.roth@....com>, <bp@...en8.de>, <hpa@...or.com>,
        <tglx@...utronix.de>, <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        <kirill.shutemov@...ux.intel.com>, <jroedel@...e.de>,
        <keescook@...omium.org>, <mingo@...hat.com>,
        <dave.hansen@...ux.intel.com>, <brijesh.singh@....com>,
        <linux-kernel@...r.kernel.org>, <x86@...nel.org>,
        <liwei391@...wei.com>
Subject: Re: [PATCH -v2] x86/boot/compressed: Register dummy NMI handler in
 EFI boot loader, to avoid kdump crashes


On 2023/1/10 20:01, Ingo Molnar wrote:
> * Ingo Molnar <mingo@...nel.org> wrote:
>
>> * Zeng Heng <zengheng4@...wei.com> wrote:
>>
>>> +void do_boot_nmi_fault(struct pt_regs *regs, unsigned long error_code)
>>> +{
>>> +	/* ignore */
>>> +}
>>> diff --git a/arch/x86/boot/compressed/idt_64.c b/arch/x86/boot/compressed/idt_64.c
>>> index 6debb816e83d..b169c9728d52 100644
>>> --- a/arch/x86/boot/compressed/idt_64.c
>>> +++ b/arch/x86/boot/compressed/idt_64.c
>>> @@ -60,6 +60,7 @@ void load_stage2_idt(void)
>>>   {
>>>   	boot_idt_desc.address = (unsigned long)boot_idt;
>>>   
>>> +	set_idt_entry(X86_TRAP_NMI, boot_nmi_fault);
>>>   	set_idt_entry(X86_TRAP_PF, boot_page_fault);
>> So it's a bit sad to install a dummy handler that does nothing, while
>> something clearly sent an NMI and expects an intelligent reaction - OTOH
>> the unexpected NMIs from from watchdog during a kdump clearly make things
>> worse by crashing the bootup.
>>
>> Anyway, I cannot think of a better response here that the boot loading code
>> could do either, so I've applied your fix to tip:x86/boot.
> BTW., the changelog had very poor quality, and the patch added no comments
> to explain the presence of the dummy NMI.
>
> The -v2 version below should address most of those problems.
>
> Thanks,
>
> 	Ingo

Thanks for your work.

B.R.,

Zeng Heng

> =============>
> From: Zeng Heng <zengheng4@...wei.com>
> Date: Tue, 10 Jan 2023 18:27:45 +0800
> Subject: [PATCH] x86/boot/compressed: Register dummy NMI handler in EFI boot loader, to avoid kdump crashes
>
> If kdump is enabled, when using mce_inject to inject errors, EFI
> boot loader would decompress & load second kernel for saving the
> vmcore file.
>
> For normal errors that is fine. However, in the MCE case, the panic
> CPU that firstly enters into mce_panic() is running within NMI
> interrupt context, and the processor blocks delivery of subsequent
> NMIs until the next execution of the IRET instruction.
>
> When the panic CPU takes long time in the panic processing route,
> and causes the watchdog timeout, at this moment, the processor
> already receives NMI interrupt in the background.
>
> In the reproducer sequence below, panic CPU would run into EFI loader
> and raise page fault exception (like visiting `vidmem` variable
> when attempting to call debug_putstr()), the CPU would execute IRET
> instruction when it exits from the page fault handler.
>
> But the loader never registers handler for NMI vector in IDT,
> lack of vector handler would cause reboot, which interrupts
> kdump procedure and fails to save the vmcore file.
>
> Here is steps to reproduce the above issue (it's sporadic):
>
> 	1. # cat uncorrected
> 	CPU 1 BANK 4
> 	STATUS uncorrected 0xc0
> 	MCGSTATUS  EIPV MCIP
> 	ADDR 0x1234
> 	RIP 0xdeadbabe
> 	RAISINGCPU 0
> 	MCGCAP SER CMCI TES 0x6
> 	2. # modprobe mce_inject
> 	3. # mce-inject uncorrected
>
> For increasing the probability of reproduction of this issue, there are
> two ways to increase the probability of the bug:
>
>    1. modify the threshold value of watchdog (increase NMI frequency);
>    2. and/or add delays before panic() in mce_panic() and modify PANIC_TIMEOUT macro;
>
> Fixes: ca0e22d4f011 ("x86/boot/compressed/64: Always switch to own page table")
> Signed-off-by: Zeng Heng <zengheng4@...wei.com>
> [ Tidy up changelog, add comments. ]
> Signed-off-by: Ingo Molnar <mingo@...nel.org>
> Link: https://lore.kernel.org/r/20230110102745.2514694-1-zengheng4@huawei.com
> ---
>   arch/x86/boot/compressed/ident_map_64.c    | 12 ++++++++++++
>   arch/x86/boot/compressed/idt_64.c          |  1 +
>   arch/x86/boot/compressed/idt_handlers_64.S |  1 +
>   arch/x86/boot/compressed/misc.h            |  1 +
>   4 files changed, 15 insertions(+)
>
> diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
> index d4a314cc50d6..cbfdefcf9657 100644
> --- a/arch/x86/boot/compressed/ident_map_64.c
> +++ b/arch/x86/boot/compressed/ident_map_64.c
> @@ -379,3 +379,15 @@ void do_boot_page_fault(struct pt_regs *regs, unsigned long error_code)
>   	 */
>   	kernel_add_identity_map(address, end);
>   }
> +
> +void do_boot_nmi_fault(struct pt_regs *regs, unsigned long error_code)
> +{
> +	/*
> +	 * Default boot loader placeholder fault handler - there's no real
> +	 * kernel running yet, so there's not much we can do - but NMIs
> +	 * can arrive in a kdump scenario, for example by the NMI watchdog.
> +	 *
> +	 * Not having any handler would cause the CPU to silently reboot,
> +	 * so we do the second-worst thing here and ignore the NMI.
> +	 */
> +}
> diff --git a/arch/x86/boot/compressed/idt_64.c b/arch/x86/boot/compressed/idt_64.c
> index 6debb816e83d..b169c9728d52 100644
> --- a/arch/x86/boot/compressed/idt_64.c
> +++ b/arch/x86/boot/compressed/idt_64.c
> @@ -60,6 +60,7 @@ void load_stage2_idt(void)
>   {
>   	boot_idt_desc.address = (unsigned long)boot_idt;
>   
> +	set_idt_entry(X86_TRAP_NMI, boot_nmi_fault);
>   	set_idt_entry(X86_TRAP_PF, boot_page_fault);
>   
>   #ifdef CONFIG_AMD_MEM_ENCRYPT
> diff --git a/arch/x86/boot/compressed/idt_handlers_64.S b/arch/x86/boot/compressed/idt_handlers_64.S
> index 22890e199f5b..2aef8e1b515b 100644
> --- a/arch/x86/boot/compressed/idt_handlers_64.S
> +++ b/arch/x86/boot/compressed/idt_handlers_64.S
> @@ -69,6 +69,7 @@ SYM_FUNC_END(\name)
>   	.text
>   	.code64
>   
> +EXCEPTION_HANDLER	boot_nmi_fault do_boot_nmi_fault error_code=0
>   EXCEPTION_HANDLER	boot_page_fault do_boot_page_fault error_code=1
>   
>   #ifdef CONFIG_AMD_MEM_ENCRYPT
> diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
> index 62208ec04ca4..d89d3f8417f6 100644
> --- a/arch/x86/boot/compressed/misc.h
> +++ b/arch/x86/boot/compressed/misc.h
> @@ -187,6 +187,7 @@ static inline void cleanup_exception_handling(void) { }
>   #endif
>   
>   /* IDT Entry Points */
> +void boot_nmi_fault(void);
>   void boot_page_fault(void);
>   void boot_stage1_vc(void);
>   void boot_stage2_vc(void);

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ