linux-kernel - Re: Random crashes with i386 and efi boots

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20180911182216.GA21160@roeck-us.net>
Date:   Tue, 11 Sep 2018 11:22:16 -0700
From:   Guenter Roeck <linux@...ck-us.net>
To:     Andy Lutomirski <luto@...capital.net>
Cc:     Joerg Roedel <jroedel@...e.de>, linux-kernel@...r.kernel.org,
        Ard Biesheuvel <ard.biesheuvel@...aro.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Michal Hocko <mhocko@...e.com>,
        Andi Kleen <ak@...ux.intel.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Dave Hansen <dave.hansen@...el.com>,
        Pavel Machek <pavel@....cz>, linux-efi@...r.kernel.org,
        x86@...nel.org
Subject: Re: Random crashes with i386 and efi boots

On Tue, Sep 11, 2018 at 11:05:25AM -0700, Andy Lutomirski wrote:
> 
> 
> > On Sep 11, 2018, at 10:41 AM, Joerg Roedel <jroedel@...e.de> wrote:
> > 
> > On Tue, Sep 11, 2018 at 09:36:51AM -0700, Andy Lutomirski wrote:
> >>>   save_pgd = efi_call_phys_prolog();
> >>>   local_irq_save(flags);
> >>>   status = efi_call_phys(...);
> >>>       local_irq_restore(flags);
> >>> 
> >>>       efi_call_phys_epilog(save_pgd);
> >>> 
> >>> So, yes, interrupts are very much enabled.
> >> 
> >> Does fixing that solve the problem?  It seems more robust.
> > 
> > The problem is still that in efi_call_phys_prolog() we load the gdt with
> > its physical address, and when we reload the %cr3 in _epilog from
> > initial_page_table to swapper_pg_dir again the gdt is no longer mapped.
> > Blocking interrupts is more robust, but we can't block NMIs that way
> > that would also trigger the issue, no?
> > 
> > So I am in favor of changing the order in efi_call_phys_epilog() too.
> > 
> 
> I’m rather confused here.  We’re loading CR3 with page tables that don’t have the fixmap mapped?  With interrupts on?  And we expect it to work?  This is *nuts*.
> 
> There are IMO only three sane fixes here:
> 
> 1. Load the fixmap, cpu_entry_area, etc into the EFI page table.  Drop the GDT reload entirely.
> 
> 2. Do this whole virtual map dance earlier so we don’t have IRQs and NMIs and such. Maybe while we’re still using the initial page table?
> 
> 3. Just identity map all the EFI regions. Make EFI page tables that literally map them at their physical addresses *and* map the entire kernel, just like we do for normal user mms.
> 
> Sure, as a stopgap, turning off IRQs and applying Guenter’s patch seems okay, but this code is not okay.

I submitted a patch with the diff I suggested above; it seems to be the
least invasive solution and addresses the immediate problem.

I am way out of league regarding the other suggested changes. I'll be happy
to test the code if someone is willing to rearrange the code accordingly,
but I don't think it would make sense to even try doing it myself.

Thanks,
Guenter