lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251020150526.000078b6@huawei.com>
Date: Mon, 20 Oct 2025 15:05:26 +0100
From: Jonathan Cameron <jonathan.cameron@...wei.com>
To: Gregory Price <gourry@...rry.net>
CC: Yiannis Nikolakopoulos <yiannis.nikolakop@...il.com>, Wei Xu
	<weixugc@...gle.com>, David Rientjes <rientjes@...gle.com>, Matthew Wilcox
	<willy@...radead.org>, Bharata B Rao <bharata@....com>,
	<linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
	<dave.hansen@...el.com>, <hannes@...xchg.org>, <mgorman@...hsingularity.net>,
	<mingo@...hat.com>, <peterz@...radead.org>, <raghavendra.kt@....com>,
	<riel@...riel.com>, <sj@...nel.org>, <ying.huang@...ux.alibaba.com>,
	<ziy@...dia.com>, <dave@...olabs.net>, <nifan.cxl@...il.com>,
	<xuezhengchu@...wei.com>, <akpm@...ux-foundation.org>, <david@...hat.com>,
	<byungchul@...com>, <kinseyho@...gle.com>, <joshua.hahnjy@...il.com>,
	<yuanchu@...gle.com>, <balbirs@...dia.com>, <alok.rathore@...sung.com>,
	<yiannis@...corp.com>, "Adam Manzanares" <a.manzanares@...sung.com>
Subject: Re: [RFC PATCH v2 0/8] mm: Hot page tracking and promotion
 infrastructure

On Fri, 17 Oct 2025 10:59:01 -0400
Gregory Price <gourry@...rry.net> wrote:

> On Fri, Oct 17, 2025 at 03:36:13PM +0100, Jonathan Cameron wrote:
> > On Fri, 17 Oct 2025 10:15:57 -0400
> > Gregory Price <gourry@...rry.net> wrote:  
> > > 
> > > Essentially the platform needs to allow a single device to expose
> > > multiple numa nodes based on different expected performance.  From
> > > those ranges.  Then software needs to program the HDM decoders
> > > appropriately.  
> > 
> > It's a bit 'fuzzy' to justify but maybe (for CXL) a CFWMS flag (so CEDT
> > as you mention) to say this host memory region may be backed by
> > compressed memory?
> >
> > Might be able to justify it from spec point of view by arguing that
> > compression is a QoS related characteristic. Always possible host
> > hardware will want to handle it differently before it even hits the
> > bus even if it's just a case throttling writing differently.
> >  
> 
> That's a Consortium discussion to have (and I am not of the
> consortium :P), but yeah you could do it that way.

The moment I know it's raised there I (and others involved in consortium)
can't talk about it in public. (I love standards org IP rules!)
So it's useful to have a pre discussion before that happens.  We've
done this before for other topics and it can be very productive.

> 
> More generally could have a "Not-for-general-consumption bit" instead
> of specifically a compressed bit.  Maybe both a "No-Consume" and a
> "Special Node" bit would be useful separately.
> 
> Of course then platforms need to be made to understand all these:
> 
> "No-Consume" -> force EFI_MEMORY_SP or leave it reserved
> "Special Node" -> allocate its own PXM / Provide discrete CFMWS
> 
> Naming obviously non-instructive here, may as well call them Nancy and
> Bob bits.

For compression specifically I think there is value in making it
explicitly compression because the host hardware might handle that
differently. The other bits might be worth having as well
though. SPM was all about 'you could' use it as normal memory but
someone put it there for something else. This more a case of
SPOM. Specific Purpose Only Memory - eats babies if you don't know
the extra rules for each instance of that.

> 
> > That then ends up in it's own NUMA node.  Whether we take on the
> > splitting CFMWS entries into multiple NUMA nodes depending on what
> > backing devices end up in them is something we kicked into the long
> > grass originally, but that can definitely be revisited.  That
> > doesn't matter for initial support of compressed memory though if
> > we can do it via a seperate CXL Fixed Memory Window Structure (CFMWS)
> > in CEDT.
> >  
> 
> This is the way I would initially approach it tbh - but i'm also not a
> hardware/firmware person, so i don't know exactly what bits a device
> would set to tell BIOS/EFI "Hey, give this chunk its own CFMWS", or if
> that lies solely with BIOS/EFI.

It's not a device thing wrt to nodes today (and there are good reasons
why it should not be at that granularity e.g. node explosion has costs).
The BIOS might pre setup the decoders and even lock them, but I'd expect
we'll move away from that to fully OS managed over time (to get flexibility)
- exception to that being when confidential compute is making its
usual mess of things.

Maybe the BIOS would have a look at devices and decide to enable a
compressed memory CFMWS if it finds devices that need it and not do
so otherwise, though not doing so breaks hotplug of compressed memory devices.

So my guess is either we need to fix Linux to allow splitting a fixed
memory window up into multiple NUMA nodes, or platforms have to spin
extra fixed memory windows (host side PA ranges with a NUMA node for each).

Which option depends a bit on whether we expect host hardware to either
handle compressed differently from normal ram, or at least separate it
for QoS reasons.

What fun.

J
> 
> ~Gregory


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ