linux-kernel - Re: [PATCH v3 3/7] mm: Split device-private and coherent folios before freeing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aWFe9S2DGwfD2smO@lstrano-desk.jf.intel.com>
Date: Fri, 9 Jan 2026 12:03:01 -0800
From: Matthew Brost <matthew.brost@...el.com>
To: Zi Yan <ziy@...dia.com>
CC: Mika Penttilä <mpenttil@...hat.com>, Francois Dugast
	<francois.dugast@...el.com>, <intel-xe@...ts.freedesktop.org>,
	<dri-devel@...ts.freedesktop.org>, Balbir Singh <balbirs@...dia.com>,
	Alistair Popple <apopple@...dia.com>, David Hildenbrand <david@...nel.org>,
	Oscar Salvador <osalvador@...e.de>, Andrew Morton
	<akpm@...ux-foundation.org>, <linux-mm@...ck.org>,
	<linux-cxl@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 3/7] mm: Split device-private and coherent folios
 before freeing

On Fri, Jan 09, 2026 at 02:23:49PM -0500, Zi Yan wrote:
> On 9 Jan 2026, at 14:08, Matthew Brost wrote:
> 
> > On Fri, Jan 09, 2026 at 01:53:33PM -0500, Zi Yan wrote:
> >> On 9 Jan 2026, at 13:26, Matthew Brost wrote:
> >>
> >>> On Fri, Jan 09, 2026 at 12:28:22PM -0500, Zi Yan wrote:
> >>>> On 9 Jan 2026, at 6:09, Mika Penttilä wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> On 1/9/26 10:54, Francois Dugast wrote:
> >>>>>
> >>>>>> From: Matthew Brost <matthew.brost@...el.com>
> >>>>>>
> >>>>>> Split device-private and coherent folios into individual pages before
> >>>>>> freeing so that any order folio can be formed upon the next use of the
> >>>>>> pages.
> >>>>>>
> >>>>>> Cc: Balbir Singh <balbirs@...dia.com>
> >>>>>> Cc: Alistair Popple <apopple@...dia.com>
> >>>>>> Cc: Zi Yan <ziy@...dia.com>
> >>>>>> Cc: David Hildenbrand <david@...nel.org>
> >>>>>> Cc: Oscar Salvador <osalvador@...e.de>
> >>>>>> Cc: Andrew Morton <akpm@...ux-foundation.org>
> >>>>>> Cc: linux-mm@...ck.org
> >>>>>> Cc: linux-cxl@...r.kernel.org
> >>>>>> Cc: linux-kernel@...r.kernel.org
> >>>>>> Signed-off-by: Matthew Brost <matthew.brost@...el.com>
> >>>>>> Signed-off-by: Francois Dugast <francois.dugast@...el.com>
> >>>>>> ---
> >>>>>>  mm/memremap.c | 2 ++
> >>>>>>  1 file changed, 2 insertions(+)
> >>>>>>
> >>>>>> diff --git a/mm/memremap.c b/mm/memremap.c
> >>>>>> index 63c6ab4fdf08..7289cdd6862f 100644
> >>>>>> --- a/mm/memremap.c
> >>>>>> +++ b/mm/memremap.c
> >>>>>> @@ -453,6 +453,8 @@ void free_zone_device_folio(struct folio *folio)
> >>>>>>  	case MEMORY_DEVICE_COHERENT:
> >>>>>>  		if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->folio_free))
> >>>>>>  			break;
> >>>>>> +
> >>>>>> +		folio_split_unref(folio);
> >>>>>>  		pgmap->ops->folio_free(folio);
> >>>>>>  		percpu_ref_put_many(&folio->pgmap->ref, nr);
> >>>>>>  		break;
> >>>>>
> >>>>> This breaks folio_free implementations like nouveau_dmem_folio_free
> >>>>> which checks the folio order and act upon that.
> >>>>> Maybe add an order parameter to folio_free or let the driver handle the split?
> >>>
> >>> 'let the driver handle the split?' - I had consisder this as an option.
> >>>
> >>>>
> >>>> Passing an order parameter might be better to avoid exposing core MM internals
> >>>> by asking drivers to undo compound pages.
> >>>>
> >>>
> >>> It looks like Nouveau tracks free folios and free pages—something Xe’s
> >>> device memory allocator (DRM Buddy) cannot do. I guess this answers my
> >>> earlier question of how Nouveau avoids hitting the same bug as Xe / GPU
> >>> SVM with respect to reusing folios. It appears Nouveau prefers not to
> >>> split the folio, so I’m leaning toward moving this call into the
> >>> driver’s folio_free function.
> >>
> >> No, that creates asymmetric page handling and is error prone.
> >>
> >
> > I agree it is asymmetric and symmetric is likely better.
> >
> >> In addition, looking at nouveau’s implementation in
> >> nouveau_dmem_page_alloc_locked(), it gets a folio from drm->dmem->free_folios,
> >> which is never split, and passes it to zone_device_folio_init(). This
> >> is wrong, since if the folio is large, it will go through prep_compound_page()
> >> again. The bug has not manifested because there is only order-9 large folios.
> >> Once mTHP support is added, how is nouveau going to allocate a order-4 folio
> >> from a free order-9 folio? Maintain a per-order free folio list and
> >> reimplement a buddy allocator? Nevertheless, nouveau’s implementation
> >
> > The way Nouveau handles memory allocations here looks wrong to me—it
> > should probably use DRM Buddy and convert a block buddy to pages rather
> > than tracking a free folio list and free page list. But this is not my
> > driver.
> >
> >> is wrong by calling prep_compound_page() on a folio (already compound page).
> >>
> >
> > I don’t disagree that this implementation is questionable.
> >
> > So what’s the suggestion here—add folio order to folio_free just to
> > accommodate Nouveau’s rather odd memory allocation algorithm? That
> > doesn’t seem right to me either.
> 
> Splitting the folio in free_zone_device_folio() and passing folio order
> to folio_free() make sense to me, since after the split, the folio passed

If this is concensous / direction - I can do this but a tree wide
change.

I do have another question for everyone here - do we think this spliting
implementation should be considered a Fixes so this can go into 6.19?

> to folio_free() contains no order information, but just the used-to-be
> head page and the remaining 511 pages are free. How does Intel Xe driver
> handle it without knowing folio order?
> 

It’s a bit convoluted, but folio/page->zone_device_data points to a
reference-counted object in GPU SVM. When the object’s reference count
drops to zero, we callback into the driver layer to release the memory.
In Xe, this is a TTM BO that resolves to a DRM Buddy allocation, which
is then released. If it’s not clear, our original allocation size
determines the granularity at which we free the backing store.

> Do we really need the order info in ->folio_free() if the folio is split
> in free_zone_device_folio()? free_zone_device_folio() should just call
> ->folio_free() 2^order times to free individual page.
> 

No. If it’s a higher-order folio—let’s say a 2MB folio—we have one
reference to our GPU SVM object, so we can free the backing in a single
->folio_free call.

Now, if that folio gets split at some point into 4KB pages, then we’d
have 512 references to this object set up in the ->folio_split calls.
We’d then expect 512 ->folio_free() calls. Same case here: if, for
whatever reason, we can’t create a 2MB device page during a 2MB
migration and need to create 512 4KB pages so we'd have 512 references
to our GPU SVM object.

Matt

> 
> Best Regards,
> Yan, Zi