[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9077ab5b-f2c8-4c8d-8441-631e7c2cf384@suse.cz>
Date: Thu, 22 Jan 2026 09:00:49 +0100
From: Vlastimil Babka <vbabka@...e.cz>
To: Matthew Brost <matthew.brost@...el.com>, Zi Yan <ziy@...dia.com>
Cc: Jason Gunthorpe <jgg@...dia.com>, Balbir Singh <balbirs@...dia.com>,
Matthew Wilcox <willy@...radead.org>, Alistair Popple <apopple@...dia.com>,
Francois Dugast <francois.dugast@...el.com>, intel-xe@...ts.freedesktop.org,
dri-devel@...ts.freedesktop.org, adhavan Srinivasan <maddy@...ux.ibm.com>,
Nicholas Piggin <npiggin@...il.com>, Michael Ellerman <mpe@...erman.id.au>,
"Christophe Leroy (CS GROUP)" <chleroy@...nel.org>,
Felix Kuehling <Felix.Kuehling@....com>,
Alex Deucher <alexander.deucher@....com>,
Christian König <christian.koenig@....com>,
David Airlie <airlied@...il.com>, Simona Vetter <simona@...ll.ch>,
Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
Maxime Ripard <mripard@...nel.org>, Thomas Zimmermann <tzimmermann@...e.de>,
Lyude Paul <lyude@...hat.com>, Danilo Krummrich <dakr@...nel.org>,
David Hildenbrand <david@...nel.org>, Oscar Salvador <osalvador@...e.de>,
Andrew Morton <akpm@...ux-foundation.org>, Leon Romanovsky
<leon@...nel.org>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>, Mike Rapoport
<rppt@...nel.org>, Suren Baghdasaryan <surenb@...gle.com>,
Michal Hocko <mhocko@...e.com>, linuxppc-dev@...ts.ozlabs.org,
kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
amd-gfx@...ts.freedesktop.org, nouveau@...ts.freedesktop.org,
linux-mm@...ck.org, linux-cxl@...r.kernel.org
Subject: Re: [PATCH v6 1/5] mm/zone_device: Reinitialize large zone device
private folios
On 1/22/26 08:19, Matthew Brost wrote:
> On Tue, Jan 20, 2026 at 10:01:18PM -0500, Zi Yan wrote:
>> On 20 Jan 2026, at 8:53, Jason Gunthorpe wrote:
>>
>
> This whole thread makes my head hurt, as does core MM.
>
> IMO the TL;DR is:
>
> - Why is Intel the only one proving this stuff works? We can debate all
> day about what should or should not work — but someone else needs to
> actually prove it.i, rather than type hypotheticals.
>
> - Intel has demonstrated that this works and is still getting blocked.
>
> - This entire thread is about a fixes patch for large device pages.
> Changing prep_compound_page is completely out of scope for a fixes
> patch, and honestly so is most of the rest of what’s being proposed.
FWIW I'm ok if this lands as a fix patch, and perceived the discussion to be
about how refactor things more properly afterwards, going forward.
> - At a minimum, you must clear every page’s flags in the loop. So why not
> conservatively clear anything else a folio might have set before calling
> an existing core-MM function, ensuring the pages are in a known state?
> This is a fixes patch.
>
> - Given the current state of the discussion, I don’t think large device
> pages should be in 6.19. And if so, why didn’t the entire device pages
> series receive this level of scrutiny earlier? It’s my mistake for not
> saying “no” until the reallocation at different sizes issue was resolved.
>
> @Andrew. - I'd revert large device pages in 6.19 as it doesn't work and
> we seemly cannot close on this.
>
> Matt
>
>> > On Mon, Jan 19, 2026 at 09:50:16PM -0500, Zi Yan wrote:
>> >>>> I suppose we want some prep_single_page(page) and some reorg to share
>> >>>> code with the other prep function.
>> >>
>> >> This is just an unnecessary need due to lack of knowledge of/do not want
>> >> to investigate core MM page and folio initialization code.
>> >
>> > It will be better to keep this related code together, not spread all
>> > around.
>>
>> Or clarify what code is for preparing pages, which would go away at memdesc
>> time, and what code is for preparing folios, which would stay.
>>
>> >
>> >>>> I don't think so. It should do the above job efficiently and iterate
>> >>>> over the page list exactly once.
>> >>
>> >> folio initialization should not iterate over any page list, since folio is
>> >> supposed to be treated as a whole instead of individual pages.
>> >
>> > The tail pages need to have the right data in them or compound_head
>> > won't work.
>>
>> That is done by set_compound_head() in prep_compound_tail().
>> prep_compound_page() take cares of it. As long as it is called, even if
>> the pages in that compound page have random states before, the compound
>> page should function correctly afterwards.
>>
>> >
>> >> folio->mapping = NULL;
>> >> folio->memcg_data = 0;
>> >> folio->flags.f &= ~PAGE_FLAGS_CHECK_AT_PREP;
>> >>
>> >> should be enough.
>> >
>> > This seems believable to me for setting up an order 0 page.
>>
>> It works for any folio, regardless of its order. fields used in second
>> or third subpages are all taken care of by prep_compound_page().
>>
>> >
>> >> if (order)
>> >> folio_set_large_rmappable(folio);
>> >
>> > That one is in zone_device_folio_init()
>>
>> Yes. And the code location looks right to me.
>>
>> >
>> > And maybe the naming has got really confused if we have both functions
>> > now :\
>>
>> Yes. One of the issues is that device private code used to only handles
>> order-0 pages and was converted to use high order folio directly without
>> using high order page (namely compound page) as an intermediate step.
>> This two-step-in-one caused confusion. But the key thing to avoid the
>> confusion is that to form a high order folio, a list of contiguous pages
>> would become a compound page by calling prep_compound_page(), then
>> the compound page becomes a folio by calling folio_set_large_rmappable().
>>
>> BTW, the code in prep_compound_head() after folio_set_order(folio, order)
>> should belong to folio_set_large_rmappable() and they are causing confusion,
>> since they are only applicable to rmappable large folios. I am going to
>> send a patch to fix it.
>>
>>
>> Best Regards,
>> Yan, Zi
Powered by blists - more mailing lists