[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20180911182216.GA21160@roeck-us.net>
Date: Tue, 11 Sep 2018 11:22:16 -0700
From: Guenter Roeck <linux@...ck-us.net>
To: Andy Lutomirski <luto@...capital.net>
Cc: Joerg Roedel <jroedel@...e.de>, linux-kernel@...r.kernel.org,
Ard Biesheuvel <ard.biesheuvel@...aro.org>,
Thomas Gleixner <tglx@...utronix.de>,
Michal Hocko <mhocko@...e.com>,
Andi Kleen <ak@...ux.intel.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Dave Hansen <dave.hansen@...el.com>,
Pavel Machek <pavel@....cz>, linux-efi@...r.kernel.org,
x86@...nel.org
Subject: Re: Random crashes with i386 and efi boots
On Tue, Sep 11, 2018 at 11:05:25AM -0700, Andy Lutomirski wrote:
>
>
> > On Sep 11, 2018, at 10:41 AM, Joerg Roedel <jroedel@...e.de> wrote:
> >
> > On Tue, Sep 11, 2018 at 09:36:51AM -0700, Andy Lutomirski wrote:
> >>> save_pgd = efi_call_phys_prolog();
> >>> local_irq_save(flags);
> >>> status = efi_call_phys(...);
> >>> local_irq_restore(flags);
> >>>
> >>> efi_call_phys_epilog(save_pgd);
> >>>
> >>> So, yes, interrupts are very much enabled.
> >>
> >> Does fixing that solve the problem? It seems more robust.
> >
> > The problem is still that in efi_call_phys_prolog() we load the gdt with
> > its physical address, and when we reload the %cr3 in _epilog from
> > initial_page_table to swapper_pg_dir again the gdt is no longer mapped.
> > Blocking interrupts is more robust, but we can't block NMIs that way
> > that would also trigger the issue, no?
> >
> > So I am in favor of changing the order in efi_call_phys_epilog() too.
> >
>
> I’m rather confused here. We’re loading CR3 with page tables that don’t have the fixmap mapped? With interrupts on? And we expect it to work? This is *nuts*.
>
> There are IMO only three sane fixes here:
>
> 1. Load the fixmap, cpu_entry_area, etc into the EFI page table. Drop the GDT reload entirely.
>
> 2. Do this whole virtual map dance earlier so we don’t have IRQs and NMIs and such. Maybe while we’re still using the initial page table?
>
> 3. Just identity map all the EFI regions. Make EFI page tables that literally map them at their physical addresses *and* map the entire kernel, just like we do for normal user mms.
>
> Sure, as a stopgap, turning off IRQs and applying Guenter’s patch seems okay, but this code is not okay.
I submitted a patch with the diff I suggested above; it seems to be the
least invasive solution and addresses the immediate problem.
I am way out of league regarding the other suggested changes. I'll be happy
to test the code if someone is willing to rearrange the code accordingly,
but I don't think it would make sense to even try doing it myself.
Thanks,
Guenter
Powered by blists - more mailing lists