[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aWWCK0C23CUl9zEq@lstrano-desk.jf.intel.com>
Date: Mon, 12 Jan 2026 15:22:19 -0800
From: Matthew Brost <matthew.brost@...el.com>
To: Zi Yan <ziy@...dia.com>
CC: Jason Gunthorpe <jgg@...pe.ca>, Matthew Wilcox <willy@...radead.org>,
Balbir Singh <balbirs@...dia.com>, Francois Dugast
<francois.dugast@...el.com>, <intel-xe@...ts.freedesktop.org>,
<dri-devel@...ts.freedesktop.org>, Madhavan Srinivasan <maddy@...ux.ibm.com>,
Nicholas Piggin <npiggin@...il.com>, Michael Ellerman <mpe@...erman.id.au>,
"Christophe Leroy (CS GROUP)" <chleroy@...nel.org>, Felix Kuehling
<Felix.Kuehling@....com>, Alex Deucher <alexander.deucher@....com>, Christian
König <christian.koenig@....com>, David Airlie
<airlied@...il.com>, Simona Vetter <simona@...ll.ch>, Maarten Lankhorst
<maarten.lankhorst@...ux.intel.com>, Maxime Ripard <mripard@...nel.org>,
Thomas Zimmermann <tzimmermann@...e.de>, Lyude Paul <lyude@...hat.com>,
Danilo Krummrich <dakr@...nel.org>, Bjorn Helgaas <bhelgaas@...gle.com>,
Logan Gunthorpe <logang@...tatee.com>, David Hildenbrand <david@...nel.org>,
Oscar Salvador <osalvador@...e.de>, Andrew Morton
<akpm@...ux-foundation.org>, Leon Romanovsky <leon@...nel.org>, "Lorenzo
Stoakes" <lorenzo.stoakes@...cle.com>, "Liam R . Howlett"
<Liam.Howlett@...cle.com>, Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport
<rppt@...nel.org>, Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko
<mhocko@...e.com>, Alistair Popple <apopple@...dia.com>,
<linuxppc-dev@...ts.ozlabs.org>, <kvm@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <amd-gfx@...ts.freedesktop.org>,
<nouveau@...ts.freedesktop.org>, <linux-pci@...r.kernel.org>,
<linux-mm@...ck.org>, <linux-cxl@...r.kernel.org>
Subject: Re: [PATCH v4 1/7] mm/zone_device: Add order argument to folio_free
callback
On Mon, Jan 12, 2026 at 06:15:26PM -0500, Zi Yan wrote:
> On 12 Jan 2026, at 16:49, Matthew Brost wrote:
>
> > On Mon, Jan 12, 2026 at 09:45:10AM -0400, Jason Gunthorpe wrote:
> >
> > Hi, catching up here.
> >
> >> On Sun, Jan 11, 2026 at 07:51:01PM -0500, Zi Yan wrote:
> >>> On 11 Jan 2026, at 19:19, Balbir Singh wrote:
> >>>
> >>>> On 1/12/26 08:35, Matthew Wilcox wrote:
> >>>>> On Sun, Jan 11, 2026 at 09:55:40PM +0100, Francois Dugast wrote:
> >>>>>> The core MM splits the folio before calling folio_free, restoring the
> >>>>>> zone pages associated with the folio to an initialized state (e.g.,
> >>>>>> non-compound, pgmap valid, etc...). The order argument represents the
> >>>>>> folio’s order prior to the split which can be used driver side to know
> >>>>>> how many pages are being freed.
> >>>>>
> >>>>> This really feels like the wrong way to fix this problem.
> >>>>>
> >>>
> >>> Hi Matthew,
> >>>
> >>> I think the wording is confusing, since the actual issue is that:
> >>>
> >>> 1. zone_device_page_init() calls prep_compound_page() to form a large folio,
> >>> 2. but free_zone_device_folio() never reverse the course,
> >>> 3. the undo of prep_compound_page() in free_zone_device_folio() needs to
> >>> be done before driver callback ->folio_free(), since once ->folio_free()
> >>> is called, the folio can be reallocated immediately,
> >>> 4. after the undo of prep_compound_page(), folio_order() can no longer provide
> >>> the original order information, thus, folio_free() needs that for proper
> >>> device side ref manipulation.
> >>
> >> There is something wrong with the driver if the "folio can be
> >> reallocated immediately".
> >>
> >> The flow generally expects there to be a driver allocator linked to
> >> folio_free()
> >>
> >> 1) Allocator finds free memory
> >> 2) zone_device_page_init() allocates the memory and makes refcount=1
> >> 3) __folio_put() knows the recount 0.
> >> 4) free_zone_device_folio() calls folio_free(), but it doesn't
> >> actually need to undo prep_compound_page() because *NOTHING* can
> >> use the page pointer at this point.
> >
> > Correct—nothing can use the folio prior to calling folio_free(). Once
> > folio_free() returns, the driver side is free to immediately reallocate
> > the folio (or a subset of its pages).
> >
> >> 5) Driver puts the memory back into the allocator and now #1 can
> >> happen. It knows how much memory to put back because folio->order
> >> is valid from #2
> >> 6) #1 happens again, then #2 happens again and the folio is in the
> >> right state for use. The successor #2 fully undoes the work of the
> >> predecessor #2.
> >>
> >> If you have races where #1 can happen immediately after #3 then the
> >> driver design is fundamentally broken and passing around order isn't
> >> going to help anything.
> >>
> >
> > The above race does not exist; if it did, I agree we’d be solving
> > nothing here.
> >
> >> If the allocator is using the struct page memory then step #5 should
> >> also clean up the struct page with the allocator data before returning
> >> it to the allocator.
> >>
> >
> > We could move the call to free_zone_device_folio_prepare() [1] into the
> > driver-side implementation of ->folio_free() and drop the order argument
> > here. Zi didn’t particularly like that; he preferred calling
> > free_zone_device_folio_prepare() [2] before invoking ->folio_free(),
> > which is why this patch exists.
>
> On a second thought, if calling free_zone_device_folio_prepare() in
> ->folio_free() works, feel free to do so.
>
+1, testing this change right now and it does indeed work.
Matt
> >
> > FWIW, I do not have a strong opinion here—either way works. Xe doesn’t
> > actually need the order regardless of where
> > free_zone_device_folio_prepare() is called, but Nouveau does need the
> > order if free_zone_device_folio_prepare() is called before
> > ->folio_free().
> >
> > [1] https://patchwork.freedesktop.org/patch/697877/?series=159120&rev=4
> > [2] https://patchwork.freedesktop.org/patch/697709/?series=159120&rev=3#comment_1282405
> >
> >> I vaugely remember talking about this before in the context of the Xe
> >> driver.. You can't just take an existing VRAM allocator and layer it
> >> on top of the folios and have it broadly ignore the folio_free
> >> callback.
> >>
> >
> > We are definitely not ignoring the ->folio_free callback—that is the
> > point at which we tell our VRAM allocator (DRM buddy) it is okay to
> > release the allocation and make it available for reuse.
> >
> > Matt
> >
> >> Jsaon
>
>
> Best Regards,
> Yan, Zi
Powered by blists - more mailing lists