lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z6TcaaScBWzZvLWW@gourry-fedora-PF4VCD3F>
Date: Thu, 6 Feb 2025 10:59:37 -0500
From: Gregory Price <gourry@...rry.net>
To: Dan Williams <dan.j.williams@...el.com>
Cc: lsf-pc@...ts.linux-foundation.org, linux-mm@...ck.org,
	linux-cxl@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: CXL Boot to Bash - Section 2: The Drivers

On Wed, Feb 05, 2025 at 04:47:17PM -0800, Dan Williams wrote:
> Gregory Price wrote:
> > [/sys/bus/cxl/devices]# ls
> > dax_region0  decoder0.0  decoder1.0  decoder2.0 .....
> > dax_region1  decoder0.1  decoder1.1  decoder3.0 .....
> > 
> > ^^^ These dax regions require `CONFIG_DEV_DAX_CXL` enabled to fully
> > surface as dax devices, which can then be converted to system ram.
> 
> At least for this problem the plan is to fall back to
> CONFIG_DEV_DAX_HMEM [1] which skips all of the RAS and device
> enumeration benefits and just shunts EFI_MEMORY_SP over to device_dax.
> 

Hm, would this actually happen in the scenario where CONFIG_DEV_DAX_CXL
is not enabled but everything else is?  The region0 still gets created
and associated with the resource, but the dax_region0 never gets
created.

On one system I have I see the following:

c050000000-fcefffffff : Soft Reserved
  c050000000-fcefffffff : CXL Window 0
    c050000000-fcefffffff : region0
      c050000000-fcefffffff : dax0.0
        c050000000-fcefffffff : System RAM (kmem)
fcf0000000-ffffffffff : Reserved
10000000000-1035fffffff : Soft Reserved
  10000000000-1035fffffff : CXL Window 1
    10000000000-1035fffffff : region1
      10000000000-1035fffffff : dax1.0
        10000000000-1035fffffff : System RAM (kmem)

I would expect the above HMEM/shunt to only work if everything down
through CXL Window 0 is torn down.

But if CONFIG_DEV_DAX_CXL is not enabled, everything "succeeds", it just
doesn't "Do what you want"(TM) - dax0.0 and RAM entries are absent.

It makes me wonder whether the driver over-componentized the build.

> I am otherwise open to suggestions about a better model for how to
> handle a type of memory capacity that elicits diverging opinions on
> whether it should be treated as System RAM, dedicated application
> memory, or some kind of cold-memory swap target.
> 

My gut tells me there's no "elegant solution" here given that user
intent is fairly unknowable - i.e. best we can do is make the build
and boot options easier to understand.

> > ---------------------------------------------------------------
> > Step 6: DAX surfacing Memory Blocks - First bit of User Policy.
> > ---------------------------------------------------------------
> > 
> > The last step in surfacing memory to allocators is to convert a dax
> > device into memory blocks. On most default kernel builds, dax devices
> > are not automatically converted to SystemRAM.
> 
> I thought most distributions are shipping with
> CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE, or the default online udev rule?
> For example Fedora is CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y and RHEL is
> CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n, but with the udev hotplug rule.
> 

Good point, my bias take showing up in the notes here.  I didn't know
RHEL had gotten as far as a udev rule already. I'll adjust my notes.

But this also hides some nuance as well - the default behavior onlines
memory into ZONE_NORMAL with DEFAULT_ONLINE (next section).  

> > Alternatively, this can be done at Build or Boot time using
> >   CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE   (v6.13 or below)
> >   CONFIG_MHP_DEFAULT_ONLINE_TYPE_*       (v6.14 or above)
> >   memhp_default_state=*                  (boot param predating cxl)
> 
> Oh, TIL the new CONFIG_MHP_DEFAULT_ONLINE_TYPE_* option.
> 

It was only just added:
https://lore.kernel.org/linux-mm/20241226182918.648799-1-gourry@gourry.net/

Basically creates parity between memhp_default_state and build options.

> > The base is 256MB aligned (the minimum for the CXL Spec), and the
> > window size is 512MB.  This results in a loss of almost a full memory
> > block worth of memory (~1280MB on the front, and ~512MB on the back).
> > 
> > This is a loss of ~0.7% of capacity (1.5GB) for that region (121.25GB).
> 
> This feels like an example, of "hey platform vendors, I understand
> that spec grants you the freedom to misalign, please refrain from taking
> advantage of that freedom".
> 

Only x86 appears to actually do this (presently) - so is this a real
constraint or just a quirk of how the x86 arch code has chosen to
"optimize memory block size"?

Granted I'm a platform consumer, not a vendor - but I wouldn't even know
where to look to see where this constraint is defined (if it is).

All I'd know is "CXL Says I can align to 256MB, and minimum memory block
size on linux is 256MB so allons y!"

On the linux side - these platforms are now out there, in the wild.
So the surface impression now appears to be that linux just throws
away ~0.5% of your CXL capacity for no reason on these platforms.

That said, I also understand that more memory blocks might affect
allocation performance when the system is pressured - but losing
gigabytes of memory can also reduce performance.
(Preview of one of my next nuance additions in section 3)

If this (advisement) change is unwelcome, then we should be spewing
a really loud warning somewhere so vendors get signal for consumers.

~Gregory

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ