lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251017153613.00004940@huawei.com>
Date: Fri, 17 Oct 2025 15:36:13 +0100
From: Jonathan Cameron <jonathan.cameron@...wei.com>
To: Gregory Price <gourry@...rry.net>
CC: Yiannis Nikolakopoulos <yiannis.nikolakop@...il.com>, Wei Xu
	<weixugc@...gle.com>, David Rientjes <rientjes@...gle.com>, Matthew Wilcox
	<willy@...radead.org>, Bharata B Rao <bharata@....com>,
	<linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
	<dave.hansen@...el.com>, <hannes@...xchg.org>, <mgorman@...hsingularity.net>,
	<mingo@...hat.com>, <peterz@...radead.org>, <raghavendra.kt@....com>,
	<riel@...riel.com>, <sj@...nel.org>, <ying.huang@...ux.alibaba.com>,
	<ziy@...dia.com>, <dave@...olabs.net>, <nifan.cxl@...il.com>,
	<xuezhengchu@...wei.com>, <akpm@...ux-foundation.org>, <david@...hat.com>,
	<byungchul@...com>, <kinseyho@...gle.com>, <joshua.hahnjy@...il.com>,
	<yuanchu@...gle.com>, <balbirs@...dia.com>, <alok.rathore@...sung.com>,
	<yiannis@...corp.com>, "Adam Manzanares" <a.manzanares@...sung.com>
Subject: Re: [RFC PATCH v2 0/8] mm: Hot page tracking and promotion
 infrastructure

On Fri, 17 Oct 2025 10:15:57 -0400
Gregory Price <gourry@...rry.net> wrote:

> On Fri, Oct 17, 2025 at 11:53:31AM +0200, Yiannis Nikolakopoulos wrote:
> > On Wed, Oct 1, 2025 at 9:22 AM Gregory Price <gourry@...rry.net> wrote:  
> > > 1. Carve out an explicit proximity domain (NUMA node) for the compressed
> > >    region via SRAT.
> > >    https://docs.kernel.org/driver-api/cxl/platform/acpi/srat.html
> > >
> > > 2. Make sure this proximity domain (NUMA node) has separate data in the
> > >    HMAT so it can be an explicit demotion target for higher tiers
> > >    https://docs.kernel.org/driver-api/cxl/platform/acpi/hmat.html  
> > This makes sense. I've done a dirty hardcoding trick in my prototype
> > so that my node is always the last target. I'll have a look on how to
> > make this right.  
> 
> I think it's probably a CEDT/CDAT/HMAT/SRAT/etc negotiation.
> 
> Essentially the platform needs to allow a single device to expose
> multiple numa nodes based on different expected performance.  From
> those ranges.  Then software needs to program the HDM decoders
> appropriately.

It's a bit 'fuzzy' to justify but maybe (for CXL) a CFWMS flag (so CEDT
as you mention) to say this host memory region may be backed by
compressed memory?

Might be able to justify it from spec point of view by arguing that
compression is a QoS related characteristic. Always possible host
hardware will want to handle it differently before it even hits the
bus even if it's just a case throttling writing differently.

That then ends up in it's own NUMA node.  Whether we take on the
splitting CFMWS entries into multiple NUMA nodes depending on what
backing devices end up in them is something we kicked into the long
grass originally, but that can definitely be revisited.  That
doesn't matter for initial support of compressed memory though if
we can do it via a seperate CXL Fixed Memory Window Structure (CFMWS)
in CEDT.

> 
> > > 5. in `alloc_migration_target()` mm/migrate.c
> > >    Since nid is not a valid buddy-allocator target, everything here
> > >    will fail.  So we can simply append the following to the bottom
> > >
> > >    device_folio_alloc = nid_to_alloc(nid, DEVICE_FOLIO_ALLOC);
> > >    if (device_folio_alloc)
> > >        folio = device_folio_alloc(...)
> > >    return folio;  
> > In my current prototype alloc_migration_target was working (naively).
> > Steps 3, 4 and 5 seem like an interesting thing to try after all this
> > discussion.  
> > >  
> 
> Right because the memory is directly accessible to the buddy allocator.
> What i'm proposing would remove this memory from the buddy allocator and
> force more explicit integration (in this case with this function).
> 
> more explicitly: in this design __folio_alloc can never access this
>                  memory.
> 
> ~Gregory


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ