[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z6LKJZkcdjuit2Ck@gourry-fedora-PF4VCD3F>
Date: Tue, 4 Feb 2025 21:17:09 -0500
From: Gregory Price <gourry@...rry.net>
To: lsf-pc@...ts.linux-foundation.org
Cc: linux-mm@...ck.org, linux-cxl@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: [LSF/MM] CXL Boot to Bash - Section 1: BIOS, EFI, and Early Boot
Tossing this out as larger documentation of these steps for comment,
not as a representation of what will show up in the talk.
This is trying to cover the minimum needed information to start
reasoning about the growing complexity of configurations.
Platform / BIOS / EFI Configuraiton
===================================
---------------------------------------
Step 1: BIOS-time hardware programming.
---------------------------------------
I don't want to focus on platform specifics, so really all you need
to know about this phase for the purpose of MM is that platforms may
program the CXL device heirarchy and lock the configuration.
In practice it means you probably can't reconfigure things after boot
without doing major teardowns of the devices and resetting them -
assuming the platform doesn't have major quirks that prevent this.
This has implications for Hotplug, Interleave, and RAS, but we'll
cover those explicitly elsewhere. Otherwise, if something gets mucked
up at this stage - complain to your platform / hardware vendor.
------------------------------------------------------------------
Step 2: BIOS / EFI generates the CEDT (CXL Early Detection Table).
------------------------------------------------------------------
This table is responsible for reporting each "CXL Host Bridge" and
"CXL Fixed Memory Window" present at boot - which enables early boot
software to manage those devices and the memory capacity presented
by those devices.
Example CEDT Entries (truncated)
Subtable Type : 00 [CXL Host Bridge Structure]
Reserved : 00
Length : 0020
Associated host bridge : 00000005
Subtable Type : 01 [CXL Fixed Memory Window Structure]
Reserved : 00
Length : 002C
Reserved : 00000000
Window base address : 000000C050000000
Window size : 0000003CA0000000
If this memory is NOT marked "Special Purpose" by BIOS (next section),
you should find a matching entry EFI Memory Map and /proc/iomem
BIOS-e820: [mem 0x000000c050000000-0x000000fcefffffff] usable
/proc/iomem: c050000000-fcefffffff : System RAM
Observation: This memory is treated as 100% normal System RAM
1) This memory may be placed in any zone (ZONE_NORMAL, typically)
2) The kernel may use this memory for arbitrary allocations
4) The driver still enumerates CXL devices and memory regions, but
3) The CXL driver CANNOT manage this memory (as of today)
(Caveat: *some* RAS features may still work, possibly)
This creates an nuanced management state.
The memory is online by default and completely usable, AND the driver
appears to be managing the devices - BUT the memory resources and the
management structure are fundamentally separate.
1) CXL Driver manages CXL features
2) Non-CXL SystemRAM mechanisms surface the memory to allocators.
---------------------------------------------------------------
Step 3: EFI_MEMORY_SP - Deferring Management to the CXL Driver.
---------------------------------------------------------------
Assuming you DON'T want CXL memory to default to SystemRAM and prefer
NOT to have your kernel allocate arbitrary resources on CXL, you
probably want to defer managing these memory regions to the CXL driver.
The mechanism for is setting EFI_MEMORY_SP bit on CXL memory in BIOS.
This will mark the memory "Special Purpose".
Doing this will result in your memory being marked "Soft Reserved" on
x86 and ARM (presently unknown on other architectures).
You will see Memory Map and iomem entries like so:
BIOS-e820: [mem 0x000000c050000000-0x000000fcefffffff] soft reserved
/proc/iomem: c050000000-fcefffffff : Soft Reserved
Unless of course:
1) CONFIG_EFI_SOFT_RESERVE=n in your build config, or
2) You set the nosoftreserve boot parameter
3) You kexec'd from a kernel where conditions #1 or #2 are met
In which case you'll get SystemRAM as if EFI_MEMORY_SP was never set.
(#3 was fun to debug, for some definition of fun. Ask me over coffee)
------------------------------------------------------------
First bit of nuanced complexity: Early-Boot Resource Re-use.
------------------------------------------------------------
How are MemoryMap resources managed by a driver after being reserved
during early boot? Example: Hot-(un)plugging a device.
What if we replace said Hot-unplugged device with a device with a new
capacity? What if the arch/platform code combines two adjacent
regions with similar attributes before creating resources?
Recent work by Nathan Fontenot [1] has been looking to try to address
some of the issues with these Soft Reserved resources and either re-using them
or handing them off entirely to the relative driver for management.
[1] https://lore.kernel.org/linux-cxl/cover.1737046620.git.nathan.fontenot@amd.com/
--------------------------------------------------------------------
The Complexity story up til now (what's likely to show up in slides)
--------------------------------------------------------------------
Platform and BIOS:
May configure all the devices prior to kernel hand-off.
May or may not support reconfiguring / hotplug.
BIOS and EFI:
EFI_MEMORY_SP - used to defer management to drivers
Kernel Build and Boot:
CONFIG_EFI_SOFT_RESERVE=n - Will always result in CXL as SystemRAM
nosoftreserve - Will always result in CXL as SystemRAM
kexec - SystemRAM configs carry over to target
--------------------------------------------------------------------
Next Up:
Driver Management - Decoders, HPA/SPA, DAX, and RAS.
Memory (Block) Hotplug - Zones, Auto-Online, and User Policy.
RAS - Poison, MCE, and why you probably want CXL=ZONE_MOVABLE.
Interleave - RAS and Region Management (Hotplug-ability)
~Gregory
Powered by blists - more mailing lists