[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <D213F8BC-5C8E-44A2-810F-918B915F0804@amacapital.net>
Date: Tue, 11 Sep 2018 04:52:02 -0700
From: Andy Lutomirski <luto@...capital.net>
To: Guenter Roeck <linux@...ck-us.net>
Cc: linux-kernel@...r.kernel.org,
Ard Biesheuvel <ard.biesheuvel@...aro.org>,
Joerg Roedel <jroedel@...e.de>,
Thomas Gleixner <tglx@...utronix.de>,
Michal Hocko <mhocko@...e.com>,
Andi Kleen <ak@...ux.intel.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Dave Hansen <dave.hansen@...el.com>,
Pavel Machek <pavel@....cz>, linux-efi@...r.kernel.org,
x86@...nel.org
Subject: Re: Random crashes with i386 and efi boots
> On Sep 10, 2018, at 2:56 PM, Guenter Roeck <linux@...ck-us.net> wrote:
>
> Hi folks,
>
> even after commit eeb89e2bb1ac ("x86/efi: Load fixmap GDT in
> efi_call_phys_epilog()"), my i386/efi qemu boot tests still crash randomly
> (roughly 5-10% of the time). As before, I don't see much useful output in
> the qemu log (this time it doesn't even complain about a triple fault).
>
> Debugging shows that the crash happens in efi_call_phys_epilog().
> A sample log from a crashed test run is attached below. It appears that
> the crash happens if there is an interrupt at a critical section of the
> code.
>
> While playing with the code, I found a possible fix.
>
> diff --git a/arch/x86/platform/efi/efi_32.c b/arch/x86/platform/efi/efi_32.c
> index 05ca14222463..9959657127f4 100644
> --- a/arch/x86/platform/efi/efi_32.c
> +++ b/arch/x86/platform/efi/efi_32.c
> @@ -85,10 +85,9 @@ pgd_t * __init efi_call_phys_prolog(void)
>
> void __init efi_call_phys_epilog(pgd_t *save_pgd)
> {
> + load_fixmap_gdt(0);
> load_cr3(save_pgd);
> __flush_tlb_all();
> -
> - load_fixmap_gdt(0);
> }
We have IRQs on here? It seems plausible that we’re in a window where the EFI pgd doesn’t have cpu_entry_area mapped. Also, the hard coded CPU 0 is suspicious.
Maybe try instrumenting the code to check whether the clone_pgd_range calls in setup_percpu.c have happened yet?
Your patch may well be correct, but, if we have IRQs on, we should really have cpu_entry_area mapped in both pgds.
Or we could turn off IRQs. Why on Earth are IRQs on in a context where the fixmap gdt is unusable?
Powered by blists - more mailing lists