lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250421185210.3372306-13-mingo@kernel.org>
Date: Mon, 21 Apr 2025 20:51:52 +0200
From: Ingo Molnar <mingo@...nel.org>
To: linux-kernel@...r.kernel.org
Cc: Andy Shevchenko <andy@...nel.org>,
	Arnd Bergmann <arnd@...nel.org>,
	Borislav Petkov <bp@...en8.de>,
	Juergen Gross <jgross@...e.com>,
	"H . Peter Anvin" <hpa@...or.com>,
	Kees Cook <keescook@...omium.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Mike Rapoport <rppt@...nel.org>,
	Paul Menzel <pmenzel@...gen.mpg.de>,
	Peter Zijlstra <peterz@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...nel.org>,
	David Woodhouse <dwmw@...zon.co.uk>
Subject: [PATCH 12/29] x86/boot/e820: Clean up confusing and self-contradictory verbiage around E820 related resource allocations

So the E820 code has a rather confusing area of code at around
e820__reserve_resources(), which is, by its plain reading,
rather self-contradictory. For example, the comment explaining
e820__reserve_resources() claims:

 - '* Mark E820 reserved areas as busy for the resource manager'

By 'E820 reserved areas' one can naively conclude that it's
talking about E820_TYPE_RESERVED areas - while those areas
are treated in exactly the opposite fashion by do_mark_busy():

        switch (type) {
        case E820_TYPE_RESERVED:
        case E820_TYPE_SOFT_RESERVED:
        case E820_TYPE_PRAM:
        case E820_TYPE_PMEM:
                return false;

Ie. E820_TYPE_RESERVED areas are *not* marked busy for the
resource manager, because E820_TYPE_RESERVED areas are
device regions that might eventually be claimed by a device driver.

This type of confusion permeates this whole area of code,
making it exceedingly difficult to read (for me at least).

So untangle it bit by bit:

 - Instead of talking about ambiguous 'reserved areas',
   talk about 'E820 device address regions' instead,
   and 'register'/'lock' them.

 - The do_mark_busy() function is a misnomer as well, because despite
   its name it 'does' nothing - it only determines what type
   of resource handling an E820 type should receive from the
   kernel. Rename it to e820_device_region() and negate its
   meaning, to avoid the 'busy/reserved' confusion. Because
   that's what this code is really about: filtering out
   device regions such as E820_TYPE_RESERVED, E820_TYPE_PRAM,
   E820_TYPE_PMEM, etc., and allowing them to be claimed
   by device drivers later on.

 - All other E820 regions (system regions) are registered and
   locked early on, before the PCI resource manager does its
   search for device BAR addresses, etc.

Also fix this somewhat misleading comment:

	/*
	 * Try to bump up RAM regions to reasonable boundaries, to
	 * avoid stolen RAM:
	 */

and explain that here we register artificial 'gap' resources
at the end of suspiciously sized RAM regions, as heuristics
to try to avoid buggy firmware with undeclared 'stolen RAM' regions:

	/*
	 * Create additional 'gaps' at the end of RAM regions,
	 * rounding them up to 64k/1MB/64MB boundaries, should
	 * they be weirdly sized, and register extra, locked
	 * resource regions for them, to make sure drivers
	 * won't claim those addresses.
	 *
	 * These are basically blind guesses and heuristics to
	 * avoid resource conflicts with broken firmware that
	 * doesn't properly list 'stolen RAM' as a system region
	 * in the E820 map.
	 */

Also improve the printout of this extra resource a bit: make the
message more unambiguous, and upgrade it from pr_debug() (where
very few people will see it), to pr_info() (where it will make
it into the syslog on default distro configs).

Also fix spelling and improve comment placement.

No change in functionality intended.

Signed-off-by: Ingo Molnar <mingo@...nel.org>
Cc: Andy Shevchenko <andy@...nel.org>
Cc: Arnd Bergmann <arnd@...nel.org>
Cc: David Woodhouse <dwmw@...zon.co.uk>
Cc: H. Peter Anvin <hpa@...or.com>
Cc: Kees Cook <keescook@...omium.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Mike Rapoport (Microsoft) <rppt@...nel.org>
---
 arch/x86/kernel/e820.c | 55 +++++++++++++++++++++++++++++++++-----------------
 1 file changed, 37 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 8fcf4edd832e..484419711ecf 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1106,37 +1106,44 @@ static unsigned long __init e820_type_to_iores_desc(struct e820_entry *entry)
 	}
 }
 
-static bool __init do_mark_busy(enum e820_type type, struct resource *res)
+/*
+ * We assign one resource entry for each E820 map entry:
+ */
+static struct resource __initdata *e820_res;
+
+/*
+ * Is this a device address region that should not be marked busy?
+ * (Versus system address regions that we register & lock early.)
+ */
+static bool __init e820_device_region(enum e820_type type, struct resource *res)
 {
-	/* this is the legacy bios/dos rom-shadow + mmio region */
+	/* This is the legacy BIOS/DOS ROM-shadow + MMIO region: */
 	if (res->start < (1ULL<<20))
-		return true;
+		return false;
 
 	/*
 	 * Treat persistent memory and other special memory ranges like
-	 * device memory, i.e. reserve it for exclusive use of a driver
+	 * device memory, i.e. keep it available for exclusive use of a
+	 * driver:
 	 */
 	switch (type) {
 	case E820_TYPE_RESERVED:
 	case E820_TYPE_SOFT_RESERVED:
 	case E820_TYPE_PRAM:
 	case E820_TYPE_PMEM:
-		return false;
+		return true;
 	case E820_TYPE_RAM:
 	case E820_TYPE_ACPI:
 	case E820_TYPE_NVS:
 	case E820_TYPE_UNUSABLE:
 	default:
-		return true;
+		return false;
 	}
 }
 
 /*
- * Mark E820 reserved areas as busy for the resource manager:
+ * Mark E820 system regions as busy for the resource manager:
  */
-
-static struct resource __initdata *e820_res;
-
 void __init e820__reserve_resources(void)
 {
 	int i;
@@ -1162,18 +1169,18 @@ void __init e820__reserve_resources(void)
 		res->desc  = e820_type_to_iores_desc(entry);
 
 		/*
-		 * Don't register the region that could be conflicted with
-		 * PCI device BAR resources and insert them later in
-		 * pcibios_resource_survey():
+		 * Skip and don't register device regions that could be conflicted
+		 * with PCI device BAR resources. They get inserted later in
+		 * pcibios_resource_survey() -> e820__reserve_resources_late():
 		 */
-		if (do_mark_busy(entry->type, res)) {
+		if (!e820_device_region(entry->type, res)) {
 			res->flags |= IORESOURCE_BUSY;
 			insert_resource(&iomem_resource, res);
 		}
 		res++;
 	}
 
-	/* Expose the kexec e820 table to the sysfs. */
+	/* Expose the kexec e820 table to sysfs: */
 	for (i = 0; i < e820_table_kexec->nr_entries; i++) {
 		struct e820_entry *entry = e820_table_kexec->entries + i;
 
@@ -1207,6 +1214,10 @@ void __init e820__reserve_resources_late(void)
 	int i;
 	struct resource *res;
 
+	/*
+	 * Register device address regions listed in the E820 map,
+	 * these can be claimed by device drivers later on:
+	 */
 	res = e820_res;
 	for (i = 0; i < e820_table->nr_entries; i++) {
 		if (!res->parent && res->end)
@@ -1215,8 +1226,16 @@ void __init e820__reserve_resources_late(void)
 	}
 
 	/*
-	 * Try to bump up RAM regions to reasonable boundaries, to
-	 * avoid stolen RAM:
+	 * Create additional 'gaps' at the end of RAM regions,
+	 * rounding them up to 64k/1MB/64MB boundaries, should
+	 * they be weirdly sized, and register extra, locked
+	 * resource regions for them, to make sure drivers
+	 * won't claim those addresses.
+	 *
+	 * These are basically blind guesses and heuristics to
+	 * avoid resource conflicts with broken firmware that
+	 * doesn't properly list 'stolen RAM' as a system region
+	 * in the E820 map.
 	 */
 	for (i = 0; i < e820_table->nr_entries; i++) {
 		struct e820_entry *entry = &e820_table->entries[i];
@@ -1232,7 +1251,7 @@ void __init e820__reserve_resources_late(void)
 		if (start >= end)
 			continue;
 
-		printk(KERN_DEBUG "e820: reserve RAM buffer [mem %#010llx-%#010llx]\n", start, end);
+		pr_info("e820: register RAM buffer resource [mem %#010llx-%#010llx]\n", start, end);
 		reserve_region_with_split(&iomem_resource, start, end, "RAM buffer");
 	}
 }
-- 
2.45.2


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ