linux-kernel - [PATCH 2/2] liveupdate: kho: allocate metadata directly from the buddy allocator

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251015053121.3978358-3-pasha.tatashin@soleen.com>
Date: Wed, 15 Oct 2025 01:31:21 -0400
From: Pasha Tatashin <pasha.tatashin@...een.com>
To: akpm@...ux-foundation.org,
	brauner@...nel.org,
	corbet@....net,
	graf@...zon.com,
	jgg@...pe.ca,
	linux-kernel@...r.kernel.org,
	linux-kselftest@...r.kernel.org,
	linux-mm@...ck.org,
	masahiroy@...nel.org,
	ojeda@...nel.org,
	pasha.tatashin@...een.com,
	pratyush@...nel.org,
	rdunlap@...radead.org,
	rppt@...nel.org,
	tj@...nel.org,
	jasonmiu@...gle.com,
	dmatlack@...gle.com,
	skhawaja@...gle.com
Subject: [PATCH 2/2] liveupdate: kho: allocate metadata directly from the buddy allocator

KHO allocates metadata for its preserved memory map using the SLUB
allocator via kzalloc(). This metadata is temporary and is used by the
next kernel during early boot to find preserved memory.

A problem arises when KFENCE is enabled. kzalloc() calls can be
randomly intercepted by kfence_alloc(), which services the allocation
from a dedicated KFENCE memory pool. This pool is allocated early in
boot via memblock.

When booting via KHO, the memblock allocator is restricted to a "scratch
area", forcing the KFENCE pool to be allocated within it. This creates a
conflict, as the scratch area is expected to be ephemeral and
overwriteable by a subsequent kexec. If KHO metadata is placed in this
KFENCE pool, it leads to memory corruption when the next kernel is
loaded.

To fix this, modify KHO to allocate its metadata directly from the buddy
allocator instead of SLUB.

As part of this change, the metadata bitmap size is increased from 512
bytes to PAGE_SIZE to align with the page-based allocations from the
buddy system.

Fixes: fc33e4b44b27 ("kexec: enable KHO support for memory preservation")
Signed-off-by: Pasha Tatashin <pasha.tatashin@...een.com>
---
 kernel/liveupdate/kexec_handover.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c
index ef1e6f7a234b..519de6d68b27 100644
--- a/kernel/liveupdate/kexec_handover.c
+++ b/kernel/liveupdate/kexec_handover.c
@@ -66,10 +66,10 @@ early_param("kho", kho_parse_enable);
  * Keep track of memory that is to be preserved across KHO.
  *
  * The serializing side uses two levels of xarrays to manage chunks of per-order
- * 512 byte bitmaps. For instance if PAGE_SIZE = 4096, the entire 1G order of a
- * 1TB system would fit inside a single 512 byte bitmap. For order 0 allocations
- * each bitmap will cover 16M of address space. Thus, for 16G of memory at most
- * 512K of bitmap memory will be needed for order 0.
+ * PAGE_SIZE byte bitmaps. For instance if PAGE_SIZE = 4096, the entire 1G order
+ * of a 8TB system would fit inside a single 4096 byte bitmap. For order 0
+ * allocations each bitmap will cover 128M of address space. Thus, for 16G of
+ * memory at most 512K of bitmap memory will be needed for order 0.
  *
  * This approach is fully incremental, as the serialization progresses folios
  * can continue be aggregated to the tracker. The final step, immediately prior
@@ -77,7 +77,7 @@ early_param("kho", kho_parse_enable);
  * successor kernel to parse.
  */
 
-#define PRESERVE_BITS (512 * 8)
+#define PRESERVE_BITS (PAGE_SIZE * 8)
 
 struct kho_mem_phys_bits {
 	DECLARE_BITMAP(preserve, PRESERVE_BITS);
@@ -131,18 +131,21 @@ static struct kho_out kho_out = {
 
 static void *xa_load_or_alloc(struct xarray *xa, unsigned long index, size_t sz)
 {
+	unsigned int order;
 	void *elm, *res;
 
 	elm = xa_load(xa, index);
 	if (elm)
 		return elm;
 
-	elm = kzalloc(sz, GFP_KERNEL);
+	order = get_order(sz);
+	elm = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
 	if (!elm)
 		return ERR_PTR(-ENOMEM);
 
-	if (WARN_ON(kho_scratch_overlap(virt_to_phys(elm), sz))) {
-		kfree(elm);
+	if (WARN_ON(kho_scratch_overlap(virt_to_phys(elm),
+					PAGE_SIZE << order))) {
+		free_pages((unsigned long)elm, order);
 		return ERR_PTR(-EINVAL);
 	}
 
@@ -151,7 +154,7 @@ static void *xa_load_or_alloc(struct xarray *xa, unsigned long index, size_t sz)
 		res = ERR_PTR(xa_err(res));
 
 	if (res) {
-		kfree(elm);
+		free_pages((unsigned long)elm, order);
 		return res;
 	}
 
@@ -357,7 +360,7 @@ static struct khoser_mem_chunk *new_chunk(struct khoser_mem_chunk *cur_chunk,
 {
 	struct khoser_mem_chunk *chunk;
 
-	chunk = kzalloc(PAGE_SIZE, GFP_KERNEL);
+	chunk = (void *)get_zeroed_page(GFP_KERNEL);
 	if (!chunk)
 		return ERR_PTR(-ENOMEM);
 
-- 
2.51.0.788.g6d19910ace-goog