lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z6LKJZkcdjuit2Ck@gourry-fedora-PF4VCD3F>
Date: Tue, 4 Feb 2025 21:17:09 -0500
From: Gregory Price <gourry@...rry.net>
To: lsf-pc@...ts.linux-foundation.org
Cc: linux-mm@...ck.org, linux-cxl@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: [LSF/MM] CXL Boot to Bash - Section 1: BIOS, EFI, and Early Boot


Tossing this out as larger documentation of these steps for comment,
not as a representation of what will show up in the talk.

This is trying to cover the minimum needed information to start
reasoning about the growing complexity of configurations.


Platform / BIOS / EFI Configuraiton
===================================
---------------------------------------
Step 1: BIOS-time hardware programming.
---------------------------------------

I don't want to focus on platform specifics, so really all you need
to know about this phase for the purpose of MM is that platforms may
program the CXL device heirarchy and lock the configuration.

In practice it means you probably can't reconfigure things after boot
without doing major teardowns of the devices and resetting them -
assuming the platform doesn't have major quirks that prevent this.

This has implications for Hotplug, Interleave, and RAS, but we'll
cover those explicitly elsewhere. Otherwise, if something gets mucked
up at this stage - complain to your platform / hardware vendor.


------------------------------------------------------------------
Step 2: BIOS / EFI generates the CEDT (CXL Early Detection Table).
------------------------------------------------------------------

This table is responsible for reporting each "CXL Host Bridge" and
"CXL Fixed Memory Window" present at boot - which enables early boot
software to manage those devices and the memory capacity presented
by those devices.

Example CEDT Entries (truncated) 
         Subtable Type : 00 [CXL Host Bridge Structure]
              Reserved : 00
                Length : 0020
Associated host bridge : 00000005

         Subtable Type : 01 [CXL Fixed Memory Window Structure]
              Reserved : 00
                Length : 002C
              Reserved : 00000000
   Window base address : 000000C050000000
           Window size : 0000003CA0000000

If this memory is NOT marked "Special Purpose" by BIOS (next section),
you should find a matching entry EFI Memory Map and /proc/iomem

BIOS-e820:   [mem 0x000000c050000000-0x000000fcefffffff] usable
/proc/iomem: c050000000-fcefffffff : System RAM


Observation: This memory is treated as 100% normal System RAM

   1) This memory may be placed in any zone (ZONE_NORMAL, typically)
   2) The kernel may use this memory for arbitrary allocations
   4) The driver still enumerates CXL devices and memory regions, but
   3) The CXL driver CANNOT manage this memory (as of today)
      (Caveat: *some* RAS features may still work, possibly)

This creates an nuanced management state.

The memory is online by default and completely usable, AND the driver
appears to be managing the devices - BUT the memory resources and the
management structure are fundamentally separate.
   1) CXL Driver manages CXL features
   2) Non-CXL SystemRAM mechanisms surface the memory to allocators.


---------------------------------------------------------------
Step 3: EFI_MEMORY_SP - Deferring Management to the CXL Driver.
---------------------------------------------------------------

Assuming you DON'T want CXL memory to default to SystemRAM and prefer
NOT to have your kernel allocate arbitrary resources on CXL, you
probably want to defer managing these memory regions to the CXL driver.

The mechanism for is setting EFI_MEMORY_SP bit on CXL memory in BIOS.
This will mark the memory "Special Purpose".

Doing this will result in your memory being marked "Soft Reserved" on
x86 and ARM (presently unknown on other architectures).

You will see Memory Map and iomem entries like so:

BIOS-e820:   [mem 0x000000c050000000-0x000000fcefffffff] soft reserved
/proc/iomem: c050000000-fcefffffff : Soft Reserved

Unless of course:
  1) CONFIG_EFI_SOFT_RESERVE=n in your build config, or
  2) You set the nosoftreserve boot parameter
  3) You kexec'd from a kernel where conditions #1 or #2 are met

In which case you'll get SystemRAM as if EFI_MEMORY_SP was never set.
(#3 was fun to debug, for some definition of fun. Ask me over coffee)

------------------------------------------------------------
First bit of nuanced complexity: Early-Boot Resource Re-use.
------------------------------------------------------------
How are MemoryMap resources managed by a driver after being reserved
during early boot? Example: Hot-(un)plugging a device.

What if we replace said Hot-unplugged device with a device with a new
capacity?  What if the arch/platform code combines two adjacent
regions with similar attributes before creating resources?

Recent work by Nathan Fontenot [1] has been looking to try to address
some of the issues with these Soft Reserved resources and either re-using them
or handing them off entirely to the relative driver for management.

[1] https://lore.kernel.org/linux-cxl/cover.1737046620.git.nathan.fontenot@amd.com/


--------------------------------------------------------------------
The Complexity story up til now (what's likely to show up in slides)
--------------------------------------------------------------------

Platform and BIOS:
   May configure all the devices prior to kernel hand-off.
   May or may not support reconfiguring / hotplug.
BIOS and EFI:
   EFI_MEMORY_SP              - used to defer management to drivers
Kernel Build and Boot:
   CONFIG_EFI_SOFT_RESERVE=n  - Will always result in CXL as SystemRAM
   nosoftreserve              - Will always result in CXL as SystemRAM
   kexec                      - SystemRAM configs carry over to target


--------------------------------------------------------------------
Next Up:
   Driver Management - Decoders, HPA/SPA, DAX, and RAS.
   Memory (Block) Hotplug - Zones, Auto-Online, and User Policy.
   RAS - Poison, MCE, and why you probably want CXL=ZONE_MOVABLE.
   Interleave - RAS and Region Management (Hotplug-ability) 

~Gregory

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ