lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250814005321.31705-1-epetron@amazon.de>
Date: Thu, 14 Aug 2025 00:53:15 +0000
From: Evangelos Petrongonas <epetron@...zon.de>
To: <rppt@...nel.org>
CC: <ardb@...nel.org>, <changyuanl@...gle.com>, <epetron@...zon.de>,
	<graf@...zon.com>, <kexec@...ts.infradead.org>, <linux-efi@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <nh-open-source@...zon.com>
Subject: Re: [PATCH] efi: Support booting with kexec handover (KHO)

Hey Mike, thanks for your review,

On Mon, 11 Aug 2025 09:39:50 +0300, Mike Rapoport <rppt@...nel.org> wrote:
> On Fri, Aug 08, 2025 at 04:36:51PM +0000, Evangelos Petrongonas wrote:
> > When KHO (Kexec HandOver) is enabled, it sets up scratch memory regions
> > early during device tree scanning. After kexec, the new kernel
> > exclusively uses this region for memory allocations during boot up to
> > the initialization of the page allocator
> >
> > However, when booting with EFI, EFI's reserve_regions() uses
> > memblock_remove(0, PHYS_ADDR_MAX) to clear all memory regions before
> > rebuilding them from EFI data. This destroys KHO scratch regions and
> > their flags, thus causing a kernel panic, as there are no scratch
> > memory regions.
> >
> > Instead of wholesale removal, iterate through memory regions and only
> > remove non-KHO ones. This preserves KHO scratch regions while still
> > allowing EFI to rebuild its memory map.
>
> It's worth mentioning that scratch areas are "good known memory" :)
>

I Will do so on Rev2.

> > Signed-off-by: Evangelos Petrongonas <epetron@...zon.de>
> > ---
> >
> > Reproduction/Verification Steps
> > The issue and the fix can be reproduced/verified by booting a VM with
> > EFI and attempting to perform a KHO enabled kexec. The fix
> > was developed/tested on arm64.
> >
> >  drivers/firmware/efi/efi-init.c | 31 +++++++++++++++++++++++++++----
> >  1 file changed, 27 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/firmware/efi/efi-init.c b/drivers/firmware/efi/efi-init.c
> > index a00e07b853f22..2f08b1ab764f6 100644
> > --- a/drivers/firmware/efi/efi-init.c
> > +++ b/drivers/firmware/efi/efi-init.c
> > @@ -164,12 +164,35 @@ static __init void reserve_regions(void)
> >  		pr_info("Processing EFI memory map:\n");
> >
> >  	/*
> > -	 * Discard memblocks discovered so far: if there are any at this
> > -	 * point, they originate from memory nodes in the DT, and UEFI
> > -	 * uses its own memory map instead.
> > +	 * Discard memblocks discovered so far except for KHO scratch regions.
> > +	 * Most memblocks at this point originate from memory nodes in the DT,
> > +	 * and UEFI uses its own memory map instead. However, if KHO is enabled,
> > +	 * scratch regions must be preserved.
> >  	 */
> >  	memblock_dump_all();
> > -	memblock_remove(0, PHYS_ADDR_MAX);
> > +
> > +	if (IS_ENABLED(CONFIG_MEMBLOCK_KHO_SCRATCH)) {
>
> It's better to condition this on kho_get_fdt() that means that we are
> actually doing a handover.
>

Hmm, I see that `kho_get_fdt()` is static. My first instinct was to use
kho_enable() instead. Diving a bit more into the initialisation flow,
during the `setup_arch()`->`efi_init()`, `kho_enable()` will return
true if kho is enabled in the cmdline, but not if we are actually doing
a KHO enabled kexec. However, in this case, the parsing of memory
regions is going to be a noop in terms of functionality, but will
contribute, negatively —though the overhead would likely be
unmeasurable to the (cold) boot time. If we  want to avoid that, we
might consider adding another function to the KHO API, like
`is_booting_with_kho()`, that practically wraps the `kho_get_fdt()`.
IMO, it feels a bit cleaner this way, as other components  don't
necessarily (need to) know the internal FDT based implementation of
KHO. That being said, I am definitely not the most experienced person
when it comes to API design, so there is a high chance that I am way
off :)

So to sum it up, I see three paths forward:
1. Condition with `kho_is_enabled()` instead of the CONFIG (accepting
   the minor cold boot overhead)
2. Post another patch that extends the KHO API, adding a wrapper for
   the `kho_get_fdt()`, like `is_booting_with_kho()` indicating that we
   are booting with KHO enabled
3. Post another patch that exports the `kho_get_fdt()` directly.

I am happy to implement any of the three, or any other suggestion you
might have.

> > +		struct memblock_region *reg;
> > +		phys_addr_t start, size;
> > +		int i;
> > +
> > +		/* Remove all non-KHO regions */
> > +		for (i = memblock.memory.cnt - 1; i >= 0; i--) {
>
> Please use for_each_mem_region()
>

Todo in Rev2.

> > +			reg = &memblock.memory.regions[i];
> > +			if (!memblock_is_kho_scratch(reg)) {
> > +				start = reg->base;
> > +				size = reg->size;
> > +				memblock_remove(start, size);
> > +			}
> > +		}
> > +	} else {
> > +	/*
> > +	 * KHO is disabled. Discard memblocks discovered so far: if there
> > +	 * are any at this point, they originate from memory nodes in the
> > +	 * DT, and UEFI uses its own memory map instead.
> > +	 */
> > +		memblock_remove(0, PHYS_ADDR_MAX);
> > +	}
> >
> >  	for_each_efi_memory_desc(md) {
> >  		paddr = md->phys_addr;
> > --
> > 2.43.0
>
> --
> Sincerely yours,
> Mike.
>
>

--
Kind Regards,
Evangelos.



Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ