linux-kernel - Re: Random crashes with i386 and efi boots

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Tue, 11 Sep 2018 11:05:25 -0700
From:   Andy Lutomirski <luto@...capital.net>
To:     Joerg Roedel <jroedel@...e.de>
Cc:     Guenter Roeck <linux@...ck-us.net>, linux-kernel@...r.kernel.org,
        Ard Biesheuvel <ard.biesheuvel@...aro.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Michal Hocko <mhocko@...e.com>,
        Andi Kleen <ak@...ux.intel.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Dave Hansen <dave.hansen@...el.com>,
        Pavel Machek <pavel@....cz>, linux-efi@...r.kernel.org,
        x86@...nel.org
Subject: Re: Random crashes with i386 and efi boots

> On Sep 11, 2018, at 10:41 AM, Joerg Roedel <jroedel@...e.de> wrote:
> 
> On Tue, Sep 11, 2018 at 09:36:51AM -0700, Andy Lutomirski wrote:
>>>   save_pgd = efi_call_phys_prolog();
>>>   local_irq_save(flags);
>>>   status = efi_call_phys(...);
>>>       local_irq_restore(flags);
>>> 
>>>       efi_call_phys_epilog(save_pgd);
>>> 
>>> So, yes, interrupts are very much enabled.
>> 
>> Does fixing that solve the problem?  It seems more robust.
> 
> The problem is still that in efi_call_phys_prolog() we load the gdt with
> its physical address, and when we reload the %cr3 in _epilog from
> initial_page_table to swapper_pg_dir again the gdt is no longer mapped.
> Blocking interrupts is more robust, but we can't block NMIs that way
> that would also trigger the issue, no?
> 
> So I am in favor of changing the order in efi_call_phys_epilog() too.
> 

I’m rather confused here.  We’re loading CR3 with page tables that don’t have the fixmap mapped?  With interrupts on?  And we expect it to work?  This is *nuts*.

There are IMO only three sane fixes here:

1. Load the fixmap, cpu_entry_area, etc into the EFI page table.  Drop the GDT reload entirely.

2. Do this whole virtual map dance earlier so we don’t have IRQs and NMIs and such. Maybe while we’re still using the initial page table?

3. Just identity map all the EFI regions. Make EFI page tables that literally map them at their physical addresses *and* map the entire kernel, just like we do for normal user mms.

Sure, as a stopgap, turning off IRQs and applying Guenter’s patch seems okay, but this code is not okay.