[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251017153613.00004940@huawei.com>
Date: Fri, 17 Oct 2025 15:36:13 +0100
From: Jonathan Cameron <jonathan.cameron@...wei.com>
To: Gregory Price <gourry@...rry.net>
CC: Yiannis Nikolakopoulos <yiannis.nikolakop@...il.com>, Wei Xu
<weixugc@...gle.com>, David Rientjes <rientjes@...gle.com>, Matthew Wilcox
<willy@...radead.org>, Bharata B Rao <bharata@....com>,
<linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
<dave.hansen@...el.com>, <hannes@...xchg.org>, <mgorman@...hsingularity.net>,
<mingo@...hat.com>, <peterz@...radead.org>, <raghavendra.kt@....com>,
<riel@...riel.com>, <sj@...nel.org>, <ying.huang@...ux.alibaba.com>,
<ziy@...dia.com>, <dave@...olabs.net>, <nifan.cxl@...il.com>,
<xuezhengchu@...wei.com>, <akpm@...ux-foundation.org>, <david@...hat.com>,
<byungchul@...com>, <kinseyho@...gle.com>, <joshua.hahnjy@...il.com>,
<yuanchu@...gle.com>, <balbirs@...dia.com>, <alok.rathore@...sung.com>,
<yiannis@...corp.com>, "Adam Manzanares" <a.manzanares@...sung.com>
Subject: Re: [RFC PATCH v2 0/8] mm: Hot page tracking and promotion
infrastructure
On Fri, 17 Oct 2025 10:15:57 -0400
Gregory Price <gourry@...rry.net> wrote:
> On Fri, Oct 17, 2025 at 11:53:31AM +0200, Yiannis Nikolakopoulos wrote:
> > On Wed, Oct 1, 2025 at 9:22 AM Gregory Price <gourry@...rry.net> wrote:
> > > 1. Carve out an explicit proximity domain (NUMA node) for the compressed
> > > region via SRAT.
> > > https://docs.kernel.org/driver-api/cxl/platform/acpi/srat.html
> > >
> > > 2. Make sure this proximity domain (NUMA node) has separate data in the
> > > HMAT so it can be an explicit demotion target for higher tiers
> > > https://docs.kernel.org/driver-api/cxl/platform/acpi/hmat.html
> > This makes sense. I've done a dirty hardcoding trick in my prototype
> > so that my node is always the last target. I'll have a look on how to
> > make this right.
>
> I think it's probably a CEDT/CDAT/HMAT/SRAT/etc negotiation.
>
> Essentially the platform needs to allow a single device to expose
> multiple numa nodes based on different expected performance. From
> those ranges. Then software needs to program the HDM decoders
> appropriately.
It's a bit 'fuzzy' to justify but maybe (for CXL) a CFWMS flag (so CEDT
as you mention) to say this host memory region may be backed by
compressed memory?
Might be able to justify it from spec point of view by arguing that
compression is a QoS related characteristic. Always possible host
hardware will want to handle it differently before it even hits the
bus even if it's just a case throttling writing differently.
That then ends up in it's own NUMA node. Whether we take on the
splitting CFMWS entries into multiple NUMA nodes depending on what
backing devices end up in them is something we kicked into the long
grass originally, but that can definitely be revisited. That
doesn't matter for initial support of compressed memory though if
we can do it via a seperate CXL Fixed Memory Window Structure (CFMWS)
in CEDT.
>
> > > 5. in `alloc_migration_target()` mm/migrate.c
> > > Since nid is not a valid buddy-allocator target, everything here
> > > will fail. So we can simply append the following to the bottom
> > >
> > > device_folio_alloc = nid_to_alloc(nid, DEVICE_FOLIO_ALLOC);
> > > if (device_folio_alloc)
> > > folio = device_folio_alloc(...)
> > > return folio;
> > In my current prototype alloc_migration_target was working (naively).
> > Steps 3, 4 and 5 seem like an interesting thing to try after all this
> > discussion.
> > >
>
> Right because the memory is directly accessible to the buddy allocator.
> What i'm proposing would remove this memory from the buddy allocator and
> force more explicit integration (in this case with this function).
>
> more explicitly: in this design __folio_alloc can never access this
> memory.
>
> ~Gregory
Powered by blists - more mailing lists