[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45A4E73B-F6C2-44B7-8C81-13E24ED12127@nvidia.com>
Date: Mon, 12 Jan 2026 18:15:26 -0500
From: Zi Yan <ziy@...dia.com>
To: Matthew Brost <matthew.brost@...el.com>
Cc: Jason Gunthorpe <jgg@...pe.ca>, Matthew Wilcox <willy@...radead.org>,
Balbir Singh <balbirs@...dia.com>,
Francois Dugast <francois.dugast@...el.com>, intel-xe@...ts.freedesktop.org,
dri-devel@...ts.freedesktop.org, Madhavan Srinivasan <maddy@...ux.ibm.com>,
Nicholas Piggin <npiggin@...il.com>, Michael Ellerman <mpe@...erman.id.au>,
"Christophe Leroy (CS GROUP)" <chleroy@...nel.org>,
Felix Kuehling <Felix.Kuehling@....com>,
Alex Deucher <alexander.deucher@....com>,
Christian König <christian.koenig@....com>,
David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>,
Lyude Paul <lyude@...hat.com>, Danilo Krummrich <dakr@...nel.org>,
Bjorn Helgaas <bhelgaas@...gle.com>, Logan Gunthorpe <logang@...tatee.com>,
David Hildenbrand <david@...nel.org>, Oscar Salvador <osalvador@...e.de>,
Andrew Morton <akpm@...ux-foundation.org>, Leon Romanovsky <leon@...nel.org>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
Alistair Popple <apopple@...dia.com>, linuxppc-dev@...ts.ozlabs.org,
kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
amd-gfx@...ts.freedesktop.org, nouveau@...ts.freedesktop.org,
linux-pci@...r.kernel.org, linux-mm@...ck.org, linux-cxl@...r.kernel.org
Subject: Re: [PATCH v4 1/7] mm/zone_device: Add order argument to folio_free
callback
On 12 Jan 2026, at 16:49, Matthew Brost wrote:
> On Mon, Jan 12, 2026 at 09:45:10AM -0400, Jason Gunthorpe wrote:
>
> Hi, catching up here.
>
>> On Sun, Jan 11, 2026 at 07:51:01PM -0500, Zi Yan wrote:
>>> On 11 Jan 2026, at 19:19, Balbir Singh wrote:
>>>
>>>> On 1/12/26 08:35, Matthew Wilcox wrote:
>>>>> On Sun, Jan 11, 2026 at 09:55:40PM +0100, Francois Dugast wrote:
>>>>>> The core MM splits the folio before calling folio_free, restoring the
>>>>>> zone pages associated with the folio to an initialized state (e.g.,
>>>>>> non-compound, pgmap valid, etc...). The order argument represents the
>>>>>> folio’s order prior to the split which can be used driver side to know
>>>>>> how many pages are being freed.
>>>>>
>>>>> This really feels like the wrong way to fix this problem.
>>>>>
>>>
>>> Hi Matthew,
>>>
>>> I think the wording is confusing, since the actual issue is that:
>>>
>>> 1. zone_device_page_init() calls prep_compound_page() to form a large folio,
>>> 2. but free_zone_device_folio() never reverse the course,
>>> 3. the undo of prep_compound_page() in free_zone_device_folio() needs to
>>> be done before driver callback ->folio_free(), since once ->folio_free()
>>> is called, the folio can be reallocated immediately,
>>> 4. after the undo of prep_compound_page(), folio_order() can no longer provide
>>> the original order information, thus, folio_free() needs that for proper
>>> device side ref manipulation.
>>
>> There is something wrong with the driver if the "folio can be
>> reallocated immediately".
>>
>> The flow generally expects there to be a driver allocator linked to
>> folio_free()
>>
>> 1) Allocator finds free memory
>> 2) zone_device_page_init() allocates the memory and makes refcount=1
>> 3) __folio_put() knows the recount 0.
>> 4) free_zone_device_folio() calls folio_free(), but it doesn't
>> actually need to undo prep_compound_page() because *NOTHING* can
>> use the page pointer at this point.
>
> Correct—nothing can use the folio prior to calling folio_free(). Once
> folio_free() returns, the driver side is free to immediately reallocate
> the folio (or a subset of its pages).
>
>> 5) Driver puts the memory back into the allocator and now #1 can
>> happen. It knows how much memory to put back because folio->order
>> is valid from #2
>> 6) #1 happens again, then #2 happens again and the folio is in the
>> right state for use. The successor #2 fully undoes the work of the
>> predecessor #2.
>>
>> If you have races where #1 can happen immediately after #3 then the
>> driver design is fundamentally broken and passing around order isn't
>> going to help anything.
>>
>
> The above race does not exist; if it did, I agree we’d be solving
> nothing here.
>
>> If the allocator is using the struct page memory then step #5 should
>> also clean up the struct page with the allocator data before returning
>> it to the allocator.
>>
>
> We could move the call to free_zone_device_folio_prepare() [1] into the
> driver-side implementation of ->folio_free() and drop the order argument
> here. Zi didn’t particularly like that; he preferred calling
> free_zone_device_folio_prepare() [2] before invoking ->folio_free(),
> which is why this patch exists.
On a second thought, if calling free_zone_device_folio_prepare() in
->folio_free() works, feel free to do so.
>
> FWIW, I do not have a strong opinion here—either way works. Xe doesn’t
> actually need the order regardless of where
> free_zone_device_folio_prepare() is called, but Nouveau does need the
> order if free_zone_device_folio_prepare() is called before
> ->folio_free().
>
> [1] https://patchwork.freedesktop.org/patch/697877/?series=159120&rev=4
> [2] https://patchwork.freedesktop.org/patch/697709/?series=159120&rev=3#comment_1282405
>
>> I vaugely remember talking about this before in the context of the Xe
>> driver.. You can't just take an existing VRAM allocator and layer it
>> on top of the folios and have it broadly ignore the folio_free
>> callback.
>>
>
> We are definitely not ignoring the ->folio_free callback—that is the
> point at which we tell our VRAM allocator (DRM buddy) it is okay to
> release the allocation and make it available for reuse.
>
> Matt
>
>> Jsaon
Best Regards,
Yan, Zi
Powered by blists - more mailing lists