lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260112134510.GC745888@ziepe.ca>
Date: Mon, 12 Jan 2026 09:45:10 -0400
From: Jason Gunthorpe <jgg@...pe.ca>
To: Zi Yan <ziy@...dia.com>
Cc: Matthew Wilcox <willy@...radead.org>, Balbir Singh <balbirs@...dia.com>,
	Francois Dugast <francois.dugast@...el.com>,
	intel-xe@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org,
	Matthew Brost <matthew.brost@...el.com>,
	Madhavan Srinivasan <maddy@...ux.ibm.com>,
	Nicholas Piggin <npiggin@...il.com>,
	Michael Ellerman <mpe@...erman.id.au>,
	"Christophe Leroy (CS GROUP)" <chleroy@...nel.org>,
	Felix Kuehling <Felix.Kuehling@....com>,
	Alex Deucher <alexander.deucher@....com>,
	Christian König <christian.koenig@....com>,
	David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
	Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
	Maxime Ripard <mripard@...nel.org>,
	Thomas Zimmermann <tzimmermann@...e.de>,
	Lyude Paul <lyude@...hat.com>, Danilo Krummrich <dakr@...nel.org>,
	Bjorn Helgaas <bhelgaas@...gle.com>,
	Logan Gunthorpe <logang@...tatee.com>,
	David Hildenbrand <david@...nel.org>,
	Oscar Salvador <osalvador@...e.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Leon Romanovsky <leon@...nel.org>,
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
	"Liam R . Howlett" <Liam.Howlett@...cle.com>,
	Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
	Suren Baghdasaryan <surenb@...gle.com>,
	Michal Hocko <mhocko@...e.com>,
	Alistair Popple <apopple@...dia.com>, linuxppc-dev@...ts.ozlabs.org,
	kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
	amd-gfx@...ts.freedesktop.org, nouveau@...ts.freedesktop.org,
	linux-pci@...r.kernel.org, linux-mm@...ck.org,
	linux-cxl@...r.kernel.org
Subject: Re: [PATCH v4 1/7] mm/zone_device: Add order argument to folio_free
 callback

On Sun, Jan 11, 2026 at 07:51:01PM -0500, Zi Yan wrote:
> On 11 Jan 2026, at 19:19, Balbir Singh wrote:
> 
> > On 1/12/26 08:35, Matthew Wilcox wrote:
> >> On Sun, Jan 11, 2026 at 09:55:40PM +0100, Francois Dugast wrote:
> >>> The core MM splits the folio before calling folio_free, restoring the
> >>> zone pages associated with the folio to an initialized state (e.g.,
> >>> non-compound, pgmap valid, etc...). The order argument represents the
> >>> folio’s order prior to the split which can be used driver side to know
> >>> how many pages are being freed.
> >>
> >> This really feels like the wrong way to fix this problem.
> >>
> 
> Hi Matthew,
> 
> I think the wording is confusing, since the actual issue is that:
> 
> 1. zone_device_page_init() calls prep_compound_page() to form a large folio,
> 2. but free_zone_device_folio() never reverse the course,
> 3. the undo of prep_compound_page() in free_zone_device_folio() needs to
>    be done before driver callback ->folio_free(), since once ->folio_free()
>    is called, the folio can be reallocated immediately,
> 4. after the undo of prep_compound_page(), folio_order() can no longer provide
>    the original order information, thus, folio_free() needs that for proper
>    device side ref manipulation.

There is something wrong with the driver if the "folio can be
reallocated immediately".

The flow generally expects there to be a driver allocator linked to
folio_free()

1) Allocator finds free memory
2) zone_device_page_init() allocates the memory and makes refcount=1
3) __folio_put() knows the recount 0.
4) free_zone_device_folio() calls folio_free(), but it doesn't
   actually need to undo prep_compound_page() because *NOTHING* can
   use the page pointer at this point.
5) Driver puts the memory back into the allocator and now #1 can
   happen. It knows how much memory to put back because folio->order
   is valid from #2
6) #1 happens again, then #2 happens again and the folio is in the
   right state for use. The successor #2 fully undoes the work of the
   predecessor #2.

If you have races where #1 can happen immediately after #3 then the
driver design is fundamentally broken and passing around order isn't
going to help anything.

If the allocator is using the struct page memory then step #5 should
also clean up the struct page with the allocator data before returning
it to the allocator.

I vaugely remember talking about this before in the context of the Xe
driver.. You can't just take an existing VRAM allocator and layer it
on top of the folios and have it broadly ignore the folio_free
callback.

Jsaon

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ