linux-kernel - Re: [PATCH v4] kho: validate preserved memory map during population

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2vxzo6mt7cl3.fsf@kernel.org>
Date: Fri, 16 Jan 2026 16:21:28 +0000
From: Pratyush Yadav <pratyush@...nel.org>
To: Breno Leitao <leitao@...ian.org>
Cc: Pasha Tatashin <pasha.tatashin@...een.com>,  akpm@...ux-foundation.org,
  rppt@...nel.org,  graf@...zon.com,  linux-kernel@...r.kernel.org,
  kexec@...ts.infradead.org,  linux-mm@...ck.org,  pratyush@...nel.org,
  ricardo.neri-calderon@...ux.intel.com,  kernel-team@...a.com
Subject: Re: [PATCH v4] kho: validate preserved memory map during population

Hi Breno,

On Thu, Jan 08 2026, Breno Leitao wrote:

> Hello Pasha,
>
> On Tue, Dec 23, 2025 at 09:01:40AM -0500, Pasha Tatashin wrote:
>> If the previous kernel enabled KHO but did not call kho_finalize()
>> (e.g., CONFIG_LIVEUPDATE=n or userspace skipped the finalization step),
>> the 'preserved-memory-map' property in the FDT remains empty/zero.
>> 
>> Previously, kho_populate() would succeed regardless of the memory map's
>> state, reserving the incoming scratch regions in memblock. However,
>> kho_memory_init() would later fail to deserialize the empty map. By that
>> time, the scratch regions were already registered, leading to partial
>> initialization and subsequent list corruption (freeing scratch area
>> twice) during kho_init().
>
> While trying my new patchset [0] on top of this patch, I got the
> following issue:
>
> 	[    0.000000] KHO: disabling KHO revival: -2
>
> Trying to solve it, I come up with a change in kho_get_mem_map_phys() to
> distinguish no memory and error, see the patch attached later.
>
> This is what I used to test [0] on top of linux-next. Is this useful?
>
> Link: https://lore.kernel.org/all/20260108-kho-v3-1-b1d6b7a89342@debian.org/ [0]
>
> thanks
> --breno
>
> commit 5d7855fede8110d74942e1b67056ba589a1cb54a
> Author: Breno Leitao <leitao@...ian.org>
> Date:   Thu Jan 8 07:44:08 2026 -0800
>
>     kho: allow KHO to work when no memory is preserved
>     
>     Fix KHO initialization failing when no memory pages were preserved by
>     the previous kernel.
>     
>     Commit eda79a683a0a ("kho: validate preserved memory map during
>     population") introduced kho_get_mem_map_phys() which returns the physical
>     address of the preserved memory map directly as its return value. The
>     caller then validates it with:
>     
>         mem_map_phys = kho_get_mem_map_phys(fdt);
>         if (!mem_map_phys) {
>             err = -ENOENT;
>             goto out;
>         }
>     
>     This creates an ambiguity: physical address 0 is used both as an error
>     indicator (property missing/malformed) and as a valid value (property
>     exists with value 0, meaning no memory was preserved).
>     
>     "No memory preserved" is a legitimate state. KHO provides features beyond
>     memory page preservation, such as previous kernel version tracking and
>     kexec count tracking. When the previous kernel enables KHO but doesn't
>     preserve any memory pages, it sets 'preserved-memory-map' to 0. This is
>     semantically different from "KHO not initialized" - it means "KHO is
>     active, there's just nothing in the memory map."

This isn't true. If you hand over _any_ state, you will at least need
the KHO FDT. And the KHO FDT is preserved memory (see the
kho_alloc_preserve() call in kho_init()). So I don't see how you can
ever have valid KHO with no memory.

mem_map_phys _can_ be 0, but only when KHO was enabled but not used. And
that is of course also a valid use case.

We want to treat mem_map_phys == 0 the same as the error path, just
without the error print. This lets us discard all previous scratch areas
since they don't have anything useful anyway, and have a fresh start.

So while you are seeing this error message, I don't think it should
break anything and KHO should still be working fine. You can
double-check this by inspecting /sys/kernel/debug/kho/out.

So I think the patch is certainly a useful fix, it just needs some
re-wording and fixups.

Some comments on the code below.

>     
>     Before eda79a683a0a, the code handled this gracefully in
>     kho_mem_deserialize():
>     
>         chunk = mem ? phys_to_virt(mem) : NULL;
>         if (!chunk)
>             return false;  // No pages, but KHO could still work
>     
>     After eda79a683a0a, the early validation conflated "no property" with
>     "property value is 0", causing KHO to be completely disabled in both
>     cases.
>     
>     Fix this by changing kho_get_mem_map_phys() to return an error code and
>     pass the physical address via pointer. This allows distinguishing between:
>      - Property missing/malformed: return -ENOENT (KHO fails)
>      - Property exists with value 0: return 0 (KHO succeeds, no memory to
>        restore)
>     
>     Fixes: eda79a683a0a ("kho: validate preserved memory map during population")
>     Signed-off-by: Breno Leitao <leitao@...ian.org>
>
> diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
> index 271d90198a08..3cf2dc6840c9 100644
> --- a/kernel/liveupdate/kexec_handover.c
> +++ b/kernel/liveupdate/kexec_handover.c
> @@ -471,8 +471,8 @@ static void __init deserialize_bitmap(unsigned int order,
>  	}
>  }
>  
> -/* Returns physical address of the preserved memory map from FDT */
> -static phys_addr_t __init kho_get_mem_map_phys(const void *fdt)
> +/* Returns 0 on success and stores physical address in *phys_out */
> +static int __init kho_get_mem_map_phys(const void *fdt, phys_addr_t *phys_out)
>  {
>  	const void *mem_ptr;
>  	int len;
> @@ -480,10 +480,11 @@ static phys_addr_t __init kho_get_mem_map_phys(const void *fdt)
>  	mem_ptr = fdt_getprop(fdt, 0, KHO_FDT_MEMORY_MAP_PROP_NAME, &len);
>  	if (!mem_ptr || len != sizeof(u64)) {
>  		pr_err("failed to get preserved memory bitmaps\n");
> -		return 0;
> +		return -ENOENT;
>  	}
>  
> -	return get_unaligned((const u64 *)mem_ptr);
> +	*phys_out = get_unaligned((const u64 *)mem_ptr);
> +	return 0;
>  }
>  
>  static void __init kho_mem_deserialize(struct khoser_mem_chunk *chunk)
> @@ -1439,7 +1440,7 @@ void __init kho_populate(phys_addr_t fdt_phys, u64 fdt_len,
>  			 phys_addr_t scratch_phys, u64 scratch_len)
>  {
>  	struct kho_scratch *scratch = NULL;
> -	phys_addr_t mem_map_phys;
> +	phys_addr_t mem_map_phys = 0;
>  	void *fdt = NULL;
>  	int err = 0;
>  	unsigned int scratch_cnt = scratch_len / sizeof(*kho_scratch);
> @@ -1466,11 +1467,9 @@ void __init kho_populate(phys_addr_t fdt_phys, u64 fdt_len,
>  		goto out;
>  	}
>  
> -	mem_map_phys = kho_get_mem_map_phys(fdt);
> -	if (!mem_map_phys) {
> -		err = -ENOENT;
> +	err = kho_get_mem_map_phys(fdt, &mem_map_phys);
> +	if (err)

This will break when mem_map_phys == 0. As I explained earlier, when
that happens we want to discard all previous scratch info and start with
a clean slate.

Making this if (err || !mem_map_phys) should do the trick. The if (err)
check before the print should make sure the error message is not printed
when we have a valid property but its value is 0.

>  		goto out;
> -	}
>  
>  	scratch = early_memremap(scratch_phys, scratch_len);
>  	if (!scratch) {

-- 
Regards,
Pratyush Yadav