lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20241213095449.881-1-yan.y.zhao@intel.com>
Date: Fri, 13 Dec 2024 17:54:49 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: ebiederm@...ssion.com
Cc: kexec@...ts.infradead.org,
	linux-kernel@...r.kernel.org,
	linux-coco@...ts.linux.dev,
	x86@...nel.org,
	rick.p.edgecombe@...el.com,
	kirill.shutemov@...ux.intel.com,
	bhe@...hat.com,
	Yan Zhao <yan.y.zhao@...el.com>
Subject: [PATCH v2 1/1] kexec_core: Accept unaccepted kexec segments' destination addresses

In TDX, to run a linux guest, TDs (hardware-isolated VMs) must accept
before accessing private memory. Accessing private memory before acceptance
is considered a fatal error and may result in the termination of the TD.

The "accepting memory" operation in guest includes the following steps:
- trigger a VM-exit
- the host OS allocates a physical page and requests hardware to map the
  physical page to the GPA.
- initialize memory content to 0.
- encrypt the memory

For a Linux guest, eagerly accepting all memory during kernel boot can slow
down the boot process and cause unnecessary memory occupation on the host
for pages that may never be accessed. Therefore, Linux guests usually opt
for a lazy mode to delay page acceptance operations by not moving the pages
to the buddy allocator's freelists. Instead, the kernel tracks memory
in 4M units and places them in a zone->unaccepted_pages list if any page in
the entire 4M range is in an unaccepted state (even if part of the memory
range may have been accepted by firmware or the kernel). When the kernel
does not have enough free pages, it will move memory from the
zone->unaccepted_pages list and accept it, ensuring that the memory is
accepted before moving it to the freelists and being available to the buddy
allocator.

The kexec segments' destination addresses are not allocated by the buddy
allocator. Instead, they are searched from normal system RAM (top-down or
bottom-up) and exclude driver-managed memory, ACPI, persistent, and
reserved memory... Although these addresses may fall within the memory
range managed by the buddy allocator (which must be in an accepted state),
they could also be outside that range and in an unaccepted state.

Since the kexec code will access the segments' destination addresses during
the kexec process by swapping their content with the segments' source
pages, it is necessary to accept the memory before performing the swap
operations.

Accept the destination addresses during the kexec load, immediately after
they pass sanity checks. This ensures the code is located in a common place
shared by both the kexec_load and kexec_file_load system calls.

This will not conflict with the accounting in try_to_accept_memory_one()
since the accounting is set during kernel boot and decremented when pages
are moved to the freelists. There is no harm in invoking accept_memory() on
a page before making it available to the buddy allocator.

No need to worry about re-accepting memory since accept_memory() checks the
unaccepted bitmap before accepting a memory page.

Although a user may perform kexec loading without ever triggering the jump,
it doesn't impact much since kexec loading is not in a performance-critical
path. Additionally, the destination addresses are always searched and found
in the same location on a given system.

Changes to the destination address searching logic to locate only memory in
either unaccepted or accepted status are unnecessary and complicated.

Signed-off-by: Yan Zhao <yan.y.zhao@...el.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
Cc: Baoquan He <bhe@...hat.com>
---
 kernel/kexec_core.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index c0caa14880c3..f8eee0516bd9 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -210,6 +210,16 @@ int sanity_check_segment_list(struct kimage *image)
 	}
 #endif
 
+	/*
+	 * The destination addresses are searched from system RAM rather than
+	 * being allocated from the buddy allocator, so they are not guaranteed
+	 * to be accepted by the current kernel.  Accept the destination
+	 * addresses before kexec swaps their content with the segments' source
+	 * pages to avoid accessing memory before it is accepted.
+	 */
+	for (i = 0; i < nr_segments; i++)
+		accept_memory(image->segment[i].mem, image->segment[i].memsz);
+
 	return 0;
 }
 
-- 
2.43.2


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ