lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 3 May 2023 12:58:05 -0500
From:   Tom Lendacky <thomas.lendacky@....com>
To:     Ard Biesheuvel <ardb@...nel.org>
Cc:     linux-efi@...r.kernel.org, linux-kernel@...r.kernel.org,
        Evgeniy Baskov <baskov@...ras.ru>,
        Borislav Petkov <bp@...en8.de>,
        Andy Lutomirski <luto@...nel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Alexey Khoroshilov <khoroshilov@...ras.ru>,
        Peter Jones <pjones@...hat.com>,
        Gerd Hoffmann <kraxel@...hat.com>,
        Dave Young <dyoung@...hat.com>,
        Mario Limonciello <mario.limonciello@....com>,
        Kees Cook <keescook@...omium.org>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH 0/6] efi/x86: Avoid legacy decompressor during EFI boot

On 5/2/23 11:08, Tom Lendacky wrote:
> On 5/2/23 08:39, Ard Biesheuvel wrote:
>> On Tue, 2 May 2023 at 15:37, Tom Lendacky <thomas.lendacky@....com> wrote:
>>>
>>> On 4/24/23 11:57, Ard Biesheuvel wrote:
>>>> This series is conceptually a combination of Evgeny's series [0] and
>>>> mine [1], both of which attempt to make the early decompressor code more
>>>> amenable to executing in the EFI environment with stricter handling of
>>>> memory permissions.
>>>>
>>>> My series [1] implemented zboot for x86, by getting rid of the entire
>>>> x86 decompressor, and replacing it with existing EFI code that does the
>>>> same but in a generic way. The downside of this is that only EFI boot is
>>>> supported, making it unviable for distros, which need to support BIOS
>>>> boot and hybrid EFI boot modes that omit the EFI stub.
>>>>
>>>> Evgeny's series [0] adapted the entire decompressor code flow to allow
>>>> it to execute in the EFI context as well as the bare metal context, and
>>>> this involves changes to the 1:1 mapping code and the page fault
>>>> handlers etc, none of which are really needed when doing EFI boot in the
>>>> first place.
>>>>
>>>> So this series attempts to occupy the middle ground here: it makes
>>>> minimal changes to the existing decompressor so some of it can be called
>>>> from the EFI stub. Then, it reimplements the EFI boot flow to decompress
>>>> the kernel and boot it directly, without relying on the trampoline code,
>>>> page table code or page fault handling code. This allows us to get rid
>>>> of quite a bit of unsavory EFI stub code, and replace it with two clear
>>>> invocations of the EFI firmware APIs to clear NX restrictions from
>>>> allocations that have been populated with executable code.
>>>>
>>>> The only code that is being reused is the decompression library itself,
>>>> along with the minimal ELF parsing that is required to copy the ELF
>>>> segments in place, and the relocation processing that fixes up absolute
>>>> symbol references to refer to the correct virtual addresses.
>>>>
>>>> Note that some of Evgeny's changes to clean up the PE/COFF header
>>>> generation will still be needed, but I've omitted those here for
>>>> brevity.
>>>
>>> I tried booting an SEV and an SEV-ES guest using this and both failed 
>>> to boot:
>>>
>>> EFI stub: WARNING: Decompression failed: Out of memory while allocating
>>> z_stream
>>>
>>> I'll have to take a closer look as to why, but it might be a couple of
>>> days before I can get to it.
>>>
>>
>> Thanks Tom.
>>
>> The internal malloc() seems to be failing, which is often caused by
>> BSS clearing problems. Could you elaborate a little bit on the boot
>> environment you are using here?
> 
> I'm using Qemu v7.2.1 as my VMM, Linux 6.3 with your series applied for my
> host/hypervisor and guest kernel and the current OVMF tree built using
> OvmfPkgX64.dsc.
> 
> I was originally using the current merge window Linux, but moved to the
> release version just to . With the release version SEV and SEV-ES still 
> fail to
> boot, but SEV actually #GPs now. And some of the register contents look
> like encrypted data:
> 
> ConvertPages: range 1000000 - 4FA1FFF covers multiple entries
> !!!! X64 Exception Type - 0D(#GP - General Protection)  CPU Apic ID - 
> 00000000 !!!!
> ExceptionData - 0000000000000000
> RIP  - 00000000597E71C1, CS  - 0000000000000038, RFLAGS - 0000000000210206
> RAX  - 1FBA02A45943B920, RCX - 0000000000AF7009, RDX - A9DAE761B64A1F1B
> RBX  - 1FBA02A45943B8C0, RSP - 000000007FD97320, RBP - 0000000002000000
> RSI  - 0000000001000000, RDI - 1FBA02A45943DE68
> R8   - 0000000003EF3C94, R9  - 0000000000000000, R10 - 000000007D7C6018
> R11  - 0000000000000000, R12 - 0000000001000000, R13 - 00000000597EDD98
> R14  - 0000000001000000, R15 - 000000007E0A5198
> DS   - 0000000000000030, ES  - 0000000000000030, FS  - 0000000000000030
> GS   - 0000000000000030, SS  - 0000000000000030
> CR0  - 0000000080010033, CR2 - 0000000000000000, CR3 - 000000007FA01000
> CR4  - 0000000000000668, CR8 - 0000000000000000
> DR0  - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
> DR3  - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
> GDTR - 000000007F7DC000 0000000000000047, LDTR - 0000000000000000
> IDTR - 000000007F34C018 0000000000000FFF,   TR - 0000000000000000
> FXSAVE_STATE - 000000007FD96F80
> !!!! Find image based on IP(0x597E71C1) 
> /root/kernels/ovmf-build-X64/Build/OvmfX64/DEBUG_GCC5/X64/MdeModulePkg/Universal/Variable/RuntimeDxe/VariableRuntimeDxe/DEBUG/Variable
> RuntimeDxe.dll (ImageBase=0000000000D4792C, EntryPoint=0000000000D50CC3) !!!!
> 
> So, yes, probably an area of memory that was zeroes when mapped
> unencrypted, but wasn't cleared after changing the mapping to
> encrypted.

Yes, looks like a bss clearing issue. If I add __section(".data") to 
free_mem_ptr and free_mem_end_ptr in arch/x86/boot/compressed/misc.c and 
to malloc_ptr and malloc_cnt in include/linux/decompress/mm.h, then I can 
boot an SEV guest.

However, an SEV-ES guest is triple faulting. This looks to be because 
we're still on the EFI CS of 0x38 after switching GDTs in 
arch/x86/kernel/head_64.S by calling startup_64_setup_env(). Before 
switching to the kernel CS, we take a #VC (from CPUID calls in sme_enable) 
and things blow up on the iretq. Moving the block headed by the comment 
"Now switch to __KERNEL_CS so IRET works reliably" to just after calling 
startup_64_setup_env() fixes it and an SEV-ES guest can boot.

This worked before because I believe we switched off the EFI CS as part of 
the kernel decompressor support and so this bug wasn't exposed. But this 
needs to be fixed regardless of this series.

Thanks,
Tom

> 
> Thanks,
> Tom
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ