lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Wed, 17 May 2023 13:50:00 -0500
From:   Tom Lendacky <thomas.lendacky@....com>
To:     "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
Cc:     "Kirill A. Shutemov" <kirill@...temov.name>,
        Borislav Petkov <bp@...en8.de>,
        Andy Lutomirski <luto@...nel.org>,
        Dave Hansen <dave.hansen@...el.com>,
        Sean Christopherson <seanjc@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Joerg Roedel <jroedel@...e.de>,
        Ard Biesheuvel <ardb@...nel.org>,
        Andi Kleen <ak@...ux.intel.com>,
        Kuppuswamy Sathyanarayanan 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        David Rientjes <rientjes@...gle.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Ingo Molnar <mingo@...hat.com>,
        Dario Faggioli <dfaggioli@...e.com>,
        Mike Rapoport <rppt@...nel.org>,
        David Hildenbrand <david@...hat.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        marcelo.cerri@...onical.com, tim.gardner@...onical.com,
        khalid.elmously@...onical.com, philip.cox@...onical.com,
        aarcange@...hat.com, peterx@...hat.com, x86@...nel.org,
        linux-mm@...ck.org, linux-coco@...ts.linux.dev,
        linux-efi@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCHv11 0/9] mm, x86/cc, efi: Implement support for unaccepted
 memory

On 5/17/23 13:36, Kirill A. Shutemov wrote:
> On Wed, May 17, 2023 at 09:32:27AM -0500, Tom Lendacky wrote:
>> On 5/16/23 18:22, Kirill A. Shutemov wrote:
>>> On Tue, May 16, 2023 at 05:41:55PM -0500, Tom Lendacky wrote:
>>>> On 5/13/23 17:04, Kirill A. Shutemov wrote:
>>>>> UEFI Specification version 2.9 introduces the concept of memory
>>>>> acceptance: some Virtual Machine platforms, such as Intel TDX or AMD
>>>>> SEV-SNP, requiring memory to be accepted before it can be used by the
>>>>> guest. Accepting happens via a protocol specific for the Virtual
>>>>> Machine platform.
>>>>>
>>>>> Accepting memory is costly and it makes VMM allocate memory for the
>>>>> accepted guest physical address range. It's better to postpone memory
>>>>> acceptance until memory is needed. It lowers boot time and reduces
>>>>> memory overhead.
>>>>>
>>>>> The kernel needs to know what memory has been accepted. Firmware
>>>>> communicates this information via memory map: a new memory type --
>>>>> EFI_UNACCEPTED_MEMORY -- indicates such memory.
>>>>>
>>>>> Range-based tracking works fine for firmware, but it gets bulky for
>>>>> the kernel: e820 has to be modified on every page acceptance. It leads
>>>>> to table fragmentation, but there's a limited number of entries in the
>>>>> e820 table
>>>>>
>>>>> Another option is to mark such memory as usable in e820 and track if the
>>>>> range has been accepted in a bitmap. One bit in the bitmap represents
>>>>> 2MiB in the address space: one 4k page is enough to track 64GiB or
>>>>> physical address space.
>>>>>
>>>>> In the worst-case scenario -- a huge hole in the middle of the
>>>>> address space -- It needs 256MiB to handle 4PiB of the address
>>>>> space.
>>>>>
>>>>> Any unaccepted memory that is not aligned to 2M gets accepted upfront.
>>>>>
>>>>> The approach lowers boot time substantially. Boot to shell is ~2.5x
>>>>> faster for 4G TDX VM and ~4x faster for 64G.
>>>>>
>>>>> TDX-specific code isolated from the core of unaccepted memory support. It
>>>>> supposed to help to plug-in different implementation of unaccepted memory
>>>>> such as SEV-SNP.
>>>>>
>>>>> -- Fragmentation study --
>>>>>
>>>>> Vlastimil and Mel were concern about effect of unaccepted memory on
>>>>> fragmentation prevention measures in page allocator. I tried to evaluate
>>>>> it, but it is tricky. As suggested I tried to run multiple parallel kernel
>>>>> builds and follow how often kmem:mm_page_alloc_extfrag gets hit.
>>>>>
>>>>> See results in the v9 of the patchset[1][2]
>>>>>
>>>>> [1] https://lore.kernel.org/all/20230330114956.20342-1-kirill.shutemov@linux.intel.com
>>>>> [2] https://lore.kernel.org/all/20230416191940.ex7ao43pmrjhru2p@box.shutemov.name
>>>>>
>>>>> --
>>>>>
>>>>> The tree can be found here:
>>>>>
>>>>> https://github.com/intel/tdx.git guest-unaccepted-memory
>>>>
>>>> I get some failures when building without TDX support selected in my
>>>> kernel config after adding unaccepted memory support for SNP:
>>>>
>>>>     In file included from arch/x86/boot/compressed/../../coco/tdx/tdx-shared.c:1,
>>>>                      from arch/x86/boot/compressed/tdx-shared.c:2:
>>>>     ./arch/x86/include/asm/tdx.h: In function ?tdx_kvm_hypercall?:
>>>>     ./arch/x86/include/asm/tdx.h:72:17: error: ?ENODEV? undeclared (first use in this function)
>>>>        72 |         return -ENODEV;
>>>>           |                 ^~~~~~
>>>>     ./arch/x86/include/asm/tdx.h:72:17: note: each undeclared identifier is reported only once for each function it appears in
>>>>
>>>> Adding an include for linux/errno.h gets past that error, but then
>>>> I get the following:
>>>>
>>>>     ld: arch/x86/boot/compressed/tdx-shared.o: in function `tdx_enc_status_changed_phys':
>>>>     tdx-shared.c:(.text+0x42): undefined reference to `__tdx_hypercall'
>>>>     ld: tdx-shared.c:(.text+0x7f): undefined reference to `__tdx_module_call'
>>>>     ld: tdx-shared.c:(.text+0xce): undefined reference to `__tdx_module_call'
>>>>     ld: tdx-shared.c:(.text+0x13b): undefined reference to `__tdx_module_call'
>>>>     ld: tdx-shared.c:(.text+0x153): undefined reference to `cc_mkdec'
>>>>     ld: tdx-shared.c:(.text+0x15d): undefined reference to `cc_mkdec'
>>>>     ld: tdx-shared.c:(.text+0x18e): undefined reference to `__tdx_hypercall'
>>>>     ld: arch/x86/boot/compressed/vmlinux: hidden symbol `__tdx_hypercall' isn't defined
>>>>     ld: final link failed: bad value
>>>>
>>>> So it looks like arch/x86/boot/compressed/tdx-shared.c is being
>>>> built, while arch/x86/boot/compressed/tdx.c isn't.
>>>
>>> Right. I think this should help:
>>>
>>> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
>>> index 78f67e0a2666..b13a58021086 100644
>>> --- a/arch/x86/boot/compressed/Makefile
>>> +++ b/arch/x86/boot/compressed/Makefile
>>> @@ -106,8 +106,8 @@ ifdef CONFIG_X86_64
>>>    endif
>>>
>>>    vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
>>> -vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdx.o $(obj)/tdcall.o
>>> -vmlinux-objs-$(CONFIG_UNACCEPTED_MEMORY) += $(obj)/mem.o $(obj)/tdx-shared.o
>>> +vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdx.o $(obj)/tdcall.o $(obj)/tdx-shared.o
>>> +vmlinux-objs-$(CONFIG_UNACCEPTED_MEMORY) += $(obj)/mem.o
>>>
>>>    vmlinux-objs-$(CONFIG_EFI) += $(obj)/efi.o
>>>    vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_mixed.o
>>>
>>>> After setting TDX in the kernel config, I can build successfully, but
>>>> I'm running into an error when trying to accept memory during
>>>> decompression.
>>>>
>>>> In drivers/firmware/efi/libstub/unaccepted_memory.c, I can see that the
>>>> unaccepted_table is allocated, but when accept_memory() is invoked the
>>>> table address is now zero. I thought maybe it had to do with bss, but even
>>>> putting it in the .data section didn't help. I'll keep digging, but if you
>>>> have any ideas, that would be great.
>>>
>>> Not right away. But maybe seeing your side of enabling would help.
>>
>> Let me get something pushed up where you can access it and I'll also send
>> you my kernel config.
>>
>> In the mean time I added the following and everything worked. But I'm not
>> sure how acceptable it is to always be checking for the table when the
>> value is zero is.
>>
>>
>> diff --git a/drivers/firmware/efi/libstub/unaccepted_memory.c b/drivers/firmware/efi/libstub/unaccepted_memory.c
>> index f4642c4f25dd..8c5632ab1208 100644
>> --- a/drivers/firmware/efi/libstub/unaccepted_memory.c
>> +++ b/drivers/firmware/efi/libstub/unaccepted_memory.c
>> @@ -183,8 +183,13 @@ void accept_memory(phys_addr_t start, phys_addr_t end)
>>   	unsigned long bitmap_size;
>>   	u64 unit_size;
>> -	if (!unaccepted_table)
>> -		return;
>> +	if (!unaccepted_table) {
>> +		efi_guid_t unaccepted_table_guid = LINUX_EFI_UNACCEPTED_MEM_TABLE_GUID;
>> +
>> +		unaccepted_table = get_efi_config_table(unaccepted_table_guid);
>> +		if (!unaccepted_table)
>> +			return;
>> +	}
>>   	unit_size = unaccepted_table->unit_size;
>>
> 
> Kudos to Ard: if efi_relocate_kernel() triggered, it copies the kernel
> image to the new place before the variable gets initialized, so it has to
> be initialized explicitly by decompressor.
> 
> It also covers the cases when bootloader doesn't use EFI stub, including
> kexec cases.
> 
> I think this fixup should work.

Yes, this fixup takes care of the problem I was seeing.

Thanks!
Tom

> 
> diff --git a/arch/x86/boot/compressed/efi.h b/arch/x86/boot/compressed/efi.h
> index cf475243b6d5..866c0af8b5b9 100644
> --- a/arch/x86/boot/compressed/efi.h
> +++ b/arch/x86/boot/compressed/efi.h
> @@ -16,6 +16,7 @@ typedef guid_t efi_guid_t __aligned(__alignof__(u32));
>   #define ACPI_TABLE_GUID				EFI_GUID(0xeb9d2d30, 0x2d88, 0x11d3,  0x9a, 0x16, 0x00, 0x90, 0x27, 0x3f, 0xc1, 0x4d)
>   #define ACPI_20_TABLE_GUID			EFI_GUID(0x8868e871, 0xe4f1, 0x11d3,  0xbc, 0x22, 0x00, 0x80, 0xc7, 0x3c, 0x88, 0x81)
>   #define EFI_CC_BLOB_GUID			EFI_GUID(0x067b1f5f, 0xcf26, 0x44c5, 0x85, 0x54, 0x93, 0xd7, 0x77, 0x91, 0x2d, 0x42)
> +#define LINUX_EFI_UNACCEPTED_MEM_TABLE_GUID	EFI_GUID(0xd5d1de3c, 0x105c, 0x44f9,  0x9e, 0xa9, 0xbc, 0xef, 0x98, 0x12, 0x00, 0x31)
>   
>   #define EFI32_LOADER_SIGNATURE	"EL32"
>   #define EFI64_LOADER_SIGNATURE	"EL64"
> @@ -105,6 +106,14 @@ struct efi_setup_data {
>   	u64 reserved[8];
>   };
>   
> +struct efi_unaccepted_memory {
> +	u32 version;
> +	u32 unit_size;
> +	u64 phys_base;
> +	u64 size;
> +	unsigned long bitmap[];
> +};
> +
>   static inline int efi_guidcmp (efi_guid_t left, efi_guid_t right)
>   {
>   	return memcmp(&left, &right, sizeof (efi_guid_t));
> diff --git a/arch/x86/boot/compressed/mem.c b/arch/x86/boot/compressed/mem.c
> index a4308d077885..0108c97399a5 100644
> --- a/arch/x86/boot/compressed/mem.c
> +++ b/arch/x86/boot/compressed/mem.c
> @@ -1,8 +1,7 @@
>   // SPDX-License-Identifier: GPL-2.0-only
>   
> -#include "../cpuflags.h"
> -#include "../string.h"
>   #include "error.h"
> +#include "misc.h"
>   #include "tdx.h"
>   #include <asm/shared/tdx.h>
>   
> @@ -40,3 +39,25 @@ void arch_accept_memory(phys_addr_t start, phys_addr_t end)
>   	else
>   		error("Cannot accept memory: unknown platform\n");
>   }
> +
> +void init_unaccepted_memory(void)
> +{
> +	guid_t guid =  LINUX_EFI_UNACCEPTED_MEM_TABLE_GUID;
> +	struct efi_unaccepted_memory *unaccepted_table;
> +	unsigned long cfg_table_pa;
> +	unsigned int cfg_table_len;
> +	int ret;
> +
> +	ret = efi_get_conf_table(boot_params, &cfg_table_pa, &cfg_table_len);
> +	if (ret)
> +		error("EFI config table not found.");
> +
> +	unaccepted_table = (void *)efi_find_vendor_table(boot_params,
> +							 cfg_table_pa,
> +							 cfg_table_len,
> +							 guid);
> +	if (unaccepted_table->version != 1)
> +		error("Unknown version of unaccepted memory table\n");
> +
> +	set_unaccepted_table(unaccepted_table);
> +}
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index eb8df0d4ad51..36535a3753f5 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -458,6 +458,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
>   
>   	if (IS_ENABLED(CONFIG_UNACCEPTED_MEMORY)) {
>   		debug_putstr("Accepting memory... ");
> +		init_unaccepted_memory();
>   		accept_memory(__pa(output), __pa(output) + needed_size);
>   	}
>   
> diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
> index 9663d1839f54..e1a0b49e0ed2 100644
> --- a/arch/x86/boot/compressed/misc.h
> +++ b/arch/x86/boot/compressed/misc.h
> @@ -247,10 +247,10 @@ static inline unsigned long efi_find_vendor_table(struct boot_params *bp,
>   }
>   #endif /* CONFIG_EFI */
>   
> -#ifdef CONFIG_UNACCEPTED_MEMORY
> +void init_unaccepted_memory(void);
> +
> +/* Implemented in EFI stub */
> +void set_unaccepted_table(struct efi_unaccepted_memory *table);
>   void accept_memory(phys_addr_t start, phys_addr_t end);
> -#else
> -static inline void accept_memory(phys_addr_t start, phys_addr_t end) {}
> -#endif
>   
>   #endif /* BOOT_COMPRESSED_MISC_H */
> diff --git a/drivers/firmware/efi/libstub/unaccepted_memory.c b/drivers/firmware/efi/libstub/unaccepted_memory.c
> index f4642c4f25dd..fd6a3195c68f 100644
> --- a/drivers/firmware/efi/libstub/unaccepted_memory.c
> +++ b/drivers/firmware/efi/libstub/unaccepted_memory.c
> @@ -6,6 +6,18 @@
>   
>   static struct efi_unaccepted_memory *unaccepted_table;
>   
> +/*
> + * Decompressor needs to initialize the variable to cover cases when the table
> + * is not allocated by EFI stub or EFI stub copied the kernel image with
> + * efi_relocate_kernel() before the variable is set.
> + *
> + * It must be call before the first usage of accept_memory() by decompressor.
> + */
> +void set_unaccepted_table(struct efi_unaccepted_memory *table)
> +{
> +	unaccepted_table = table;
> +}
> +
>   efi_status_t allocate_unaccepted_bitmap(__u32 nr_desc,
>   					struct efi_boot_memmap *map)
>   {

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ