lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <218D42B0-3E08-4ABC-9FB4-1203BB31E547@nvidia.com>
Date: Mon, 12 Jan 2026 11:31:04 -0500
From: Zi Yan <ziy@...dia.com>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: Matthew Wilcox <willy@...radead.org>, Balbir Singh <balbirs@...dia.com>,
 Francois Dugast <francois.dugast@...el.com>, intel-xe@...ts.freedesktop.org,
 dri-devel@...ts.freedesktop.org, Matthew Brost <matthew.brost@...el.com>,
 Madhavan Srinivasan <maddy@...ux.ibm.com>,
 Nicholas Piggin <npiggin@...il.com>, Michael Ellerman <mpe@...erman.id.au>,
 "Christophe Leroy (CS GROUP)" <chleroy@...nel.org>,
 Felix Kuehling <Felix.Kuehling@....com>,
 Alex Deucher <alexander.deucher@....com>,
 Christian König <christian.koenig@....com>,
 David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
 Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
 Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>,
 Lyude Paul <lyude@...hat.com>, Danilo Krummrich <dakr@...nel.org>,
 Bjorn Helgaas <bhelgaas@...gle.com>, Logan Gunthorpe <logang@...tatee.com>,
 David Hildenbrand <david@...nel.org>, Oscar Salvador <osalvador@...e.de>,
 Andrew Morton <akpm@...ux-foundation.org>, Leon Romanovsky <leon@...nel.org>,
 Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
 "Liam R . Howlett" <Liam.Howlett@...cle.com>,
 Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
 Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
 Alistair Popple <apopple@...dia.com>, linuxppc-dev@...ts.ozlabs.org,
 kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
 amd-gfx@...ts.freedesktop.org, nouveau@...ts.freedesktop.org,
 linux-pci@...r.kernel.org, linux-mm@...ck.org, linux-cxl@...r.kernel.org
Subject: Re: [PATCH v4 1/7] mm/zone_device: Add order argument to folio_free
 callback

On 12 Jan 2026, at 8:45, Jason Gunthorpe wrote:

> On Sun, Jan 11, 2026 at 07:51:01PM -0500, Zi Yan wrote:
>> On 11 Jan 2026, at 19:19, Balbir Singh wrote:
>>
>>> On 1/12/26 08:35, Matthew Wilcox wrote:
>>>> On Sun, Jan 11, 2026 at 09:55:40PM +0100, Francois Dugast wrote:
>>>>> The core MM splits the folio before calling folio_free, restoring the
>>>>> zone pages associated with the folio to an initialized state (e.g.,
>>>>> non-compound, pgmap valid, etc...). The order argument represents the
>>>>> folio’s order prior to the split which can be used driver side to know
>>>>> how many pages are being freed.
>>>>
>>>> This really feels like the wrong way to fix this problem.
>>>>
>>
>> Hi Matthew,
>>
>> I think the wording is confusing, since the actual issue is that:
>>
>> 1. zone_device_page_init() calls prep_compound_page() to form a large folio,
>> 2. but free_zone_device_folio() never reverse the course,
>> 3. the undo of prep_compound_page() in free_zone_device_folio() needs to
>>    be done before driver callback ->folio_free(), since once ->folio_free()
>>    is called, the folio can be reallocated immediately,
>> 4. after the undo of prep_compound_page(), folio_order() can no longer provide
>>    the original order information, thus, folio_free() needs that for proper
>>    device side ref manipulation.
>
> There is something wrong with the driver if the "folio can be
> reallocated immediately".
>
> The flow generally expects there to be a driver allocator linked to
> folio_free()
>
> 1) Allocator finds free memory
> 2) zone_device_page_init() allocates the memory and makes refcount=1
> 3) __folio_put() knows the recount 0.
> 4) free_zone_device_folio() calls folio_free(), but it doesn't
>    actually need to undo prep_compound_page() because *NOTHING* can
>    use the page pointer at this point.
> 5) Driver puts the memory back into the allocator and now #1 can
>    happen. It knows how much memory to put back because folio->order
>    is valid from #2
> 6) #1 happens again, then #2 happens again and the folio is in the
>    right state for use. The successor #2 fully undoes the work of the
>    predecessor #2.

But how can a successor #2 undo the work if the second #1 only allocates
half of the original folio? For example, an order-9 at PFN 0 is
allocated and freed, then an order-8 at PFN 0 is allocated and another
order-8 at PFN 256 is allocated. How can two #2s undo the same order-9
without corrupting each other’s data?


>
> If you have races where #1 can happen immediately after #3 then the
> driver design is fundamentally broken and passing around order isn't
> going to help anything.
>
> If the allocator is using the struct page memory then step #5 should
> also clean up the struct page with the allocator data before returning
> it to the allocator.

Do you mean ->folio_free() callback should undo prep_compound_page()
instead?

>
> I vaugely remember talking about this before in the context of the Xe
> driver.. You can't just take an existing VRAM allocator and layer it
> on top of the folios and have it broadly ignore the folio_free
> callback.


Best Regards,
Yan, Zi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ