[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aWRLeoUJAYAWbLD3@lstrano-desk.jf.intel.com>
Date: Sun, 11 Jan 2026 17:16:42 -0800
From: Matthew Brost <matthew.brost@...el.com>
To: Balbir Singh <balbirs@...dia.com>
CC: Francois Dugast <francois.dugast@...el.com>,
<intel-xe@...ts.freedesktop.org>, <dri-devel@...ts.freedesktop.org>, Zi Yan
<ziy@...dia.com>, David Hildenbrand <david@...nel.org>, Oscar Salvador
<osalvador@...e.de>, Andrew Morton <akpm@...ux-foundation.org>, "Lorenzo
Stoakes" <lorenzo.stoakes@...cle.com>, "Liam R . Howlett"
<Liam.Howlett@...cle.com>, Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport
<rppt@...nel.org>, Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko
<mhocko@...e.com>, Alistair Popple <apopple@...dia.com>,
<linux-mm@...ck.org>, <linux-cxl@...r.kernel.org>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4 2/7] mm/zone_device: Add
free_zone_device_folio_prepare() helper
On Mon, Jan 12, 2026 at 11:44:15AM +1100, Balbir Singh wrote:
> On 1/12/26 06:55, Francois Dugast wrote:
> > From: Matthew Brost <matthew.brost@...el.com>
> >
> > Add free_zone_device_folio_prepare(), a helper that restores large
> > ZONE_DEVICE folios to a sane, initial state before freeing them.
> >
> > Compound ZONE_DEVICE folios overwrite per-page state (e.g. pgmap and
> > compound metadata). Before returning such pages to the device pgmap
> > allocator, each constituent page must be reset to a standalone
> > ZONE_DEVICE folio with a valid pgmap and no compound state.
> >
> > Use this helper prior to folio_free() for device-private and
> > device-coherent folios to ensure consistent device page state for
> > subsequent allocations.
> >
> > Fixes: d245f9b4ab80 ("mm/zone_device: support large zone device private folios")
> > Cc: Zi Yan <ziy@...dia.com>
> > Cc: David Hildenbrand <david@...nel.org>
> > Cc: Oscar Salvador <osalvador@...e.de>
> > Cc: Andrew Morton <akpm@...ux-foundation.org>
> > Cc: Balbir Singh <balbirs@...dia.com>
> > Cc: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> > Cc: Liam R. Howlett <Liam.Howlett@...cle.com>
> > Cc: Vlastimil Babka <vbabka@...e.cz>
> > Cc: Mike Rapoport <rppt@...nel.org>
> > Cc: Suren Baghdasaryan <surenb@...gle.com>
> > Cc: Michal Hocko <mhocko@...e.com>
> > Cc: Alistair Popple <apopple@...dia.com>
> > Cc: linux-mm@...ck.org
> > Cc: linux-cxl@...r.kernel.org
> > Cc: linux-kernel@...r.kernel.org
> > Suggested-by: Alistair Popple <apopple@...dia.com>
> > Signed-off-by: Matthew Brost <matthew.brost@...el.com>
> > Signed-off-by: Francois Dugast <francois.dugast@...el.com>
> > ---
> > include/linux/memremap.h | 1 +
> > mm/memremap.c | 55 ++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 56 insertions(+)
> >
> > diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> > index 97fcffeb1c1e..88e1d4707296 100644
> > --- a/include/linux/memremap.h
> > +++ b/include/linux/memremap.h
> > @@ -230,6 +230,7 @@ static inline bool is_fsdax_page(const struct page *page)
> >
> > #ifdef CONFIG_ZONE_DEVICE
> > void zone_device_page_init(struct page *page, unsigned int order);
> > +void free_zone_device_folio_prepare(struct folio *folio);
> > void *memremap_pages(struct dev_pagemap *pgmap, int nid);
> > void memunmap_pages(struct dev_pagemap *pgmap);
> > void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap);
> > diff --git a/mm/memremap.c b/mm/memremap.c
> > index 39dc4bd190d0..375a61e18858 100644
> > --- a/mm/memremap.c
> > +++ b/mm/memremap.c
> > @@ -413,6 +413,60 @@ struct dev_pagemap *get_dev_pagemap(unsigned long pfn)
> > }
> > EXPORT_SYMBOL_GPL(get_dev_pagemap);
> >
> > +/**
> > + * free_zone_device_folio_prepare() - Prepare a ZONE_DEVICE folio for freeing.
> > + * @folio: ZONE_DEVICE folio to prepare for release.
> > + *
> > + * ZONE_DEVICE pages/folios (e.g., device-private memory or fsdax-backed pages)
> > + * can be compound. When freeing a compound ZONE_DEVICE folio, the tail pages
> > + * must be restored to a sane ZONE_DEVICE state before they are released.
> > + *
> > + * This helper:
> > + * - Clears @folio->mapping and, for compound folios, clears each page's
> > + * compound-head state (ClearPageHead()/clear_compound_head()).
> > + * - Resets the compound order metadata (folio_reset_order()) and then
> > + * initializes each constituent page as a standalone ZONE_DEVICE folio:
> > + * * clears ->mapping
> > + * * restores ->pgmap (prep_compound_page() overwrites it)
> > + * * clears ->share (only relevant for fsdax; unused for device-private)
> > + *
> > + * If @folio is order-0, only the mapping is cleared and no further work is
> > + * required.
> > + */
> > +void free_zone_device_folio_prepare(struct folio *folio)
> > +{
> > + struct dev_pagemap *pgmap = page_pgmap(&folio->page);
> > + int order, i;
> > +
> > + VM_WARN_ON_FOLIO(!folio_is_zone_device(folio), folio);
> > +
> > + folio->mapping = NULL;
> > + order = folio_order(folio);
> > + if (!order)
> > + return;
> > +
> > + folio_reset_order(folio);
> > +
> > + for (i = 0; i < (1UL << order); i++) {
> > + struct page *page = folio_page(folio, i);
> > + struct folio *new_folio = (struct folio *)page;
> > +
> > + ClearPageHead(page);
> > + clear_compound_head(page);
> > +
> > + new_folio->mapping = NULL;
> > + /*
> > + * Reset pgmap which was over-written by
> > + * prep_compound_page().
> > + */
> > + new_folio->pgmap = pgmap;
> > + new_folio->share = 0; /* fsdax only, unused for device private */
> > + VM_WARN_ON_FOLIO(folio_ref_count(new_folio), new_folio);
> > + VM_WARN_ON_FOLIO(!folio_is_zone_device(new_folio), new_folio);
>
> Does calling the free_folio() callback on new_folio solve the issue you are facing, or is
> that PMD_ORDER more frees than we'd like?
>
No, calling free_folio() more often doesn’t solve anything—in fact, that
would make my implementation explode. I explained this in detail here [1]
to Zi.
To recap [1], my memory allocator has no visibility into individual
pages or folios; it is DRM Buddy layered on top of TTM BO. This design
allows VRAM to be allocated or evicted for both traditional GPU
allocations (GEMs) and SVM allocations.
Now, to recap the actual issue: if device folios are not split upon free
and are later reallocated with a different order in
zone_device_page_init, the implementation breaks. This problem is not
specific to Xe—Nouveau happens to always allocate at the same order, so
it works by coincidence. Reallocating at a different order is valid
behavior and must be supported.
Matt
[1] https://patchwork.freedesktop.org/patch/697710/?series=159119&rev=3#comment_1282413
> > + }
> > +}
> > +EXPORT_SYMBOL_GPL(free_zone_device_folio_prepare);
> > +
> > void free_zone_device_folio(struct folio *folio)
> > {
> > struct dev_pagemap *pgmap = folio->pgmap;
> > @@ -454,6 +508,7 @@ void free_zone_device_folio(struct folio *folio)
> > case MEMORY_DEVICE_COHERENT:
> > if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->folio_free))
> > break;
> > + free_zone_device_folio_prepare(folio);
> > pgmap->ops->folio_free(folio, order);
> > percpu_ref_put_many(&folio->pgmap->ref, nr);
> > break;
>
> Balbir
Powered by blists - more mailing lists