[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7f891077-39a2-4c0a-87ec-8ef1a244f7ad@redhat.com>
Date: Tue, 5 Aug 2025 15:20:15 +0200
From: David Hildenbrand <david@...hat.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Alex Williamson <alex.williamson@...hat.com>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"lizhe.67@...edance.com" <lizhe.67@...edance.com>,
Jason Gunthorpe <jgg@...dia.com>
Subject: Re: [GIT PULL] VFIO updates for v6.17-rc1
On 05.08.25 15:00, Linus Torvalds wrote:
> On Tue, 5 Aug 2025 at 10:47, David Hildenbrand <david@...hat.com> wrote:
>>
>> The concern is rather false positives, meaning, you want consecutive
>> PFNs (just like within a folio), but -- because the stars aligned --
>> you get consecutive "struct page" that do not translate to consecutive PFNs.
>
> So I don't think that can happen with a valid 'struct page', because
> if the 'struct page's are in different sections, they will have been
> allocated separately too.
I think you can end up with two memory sections not being consecutive,
but the struct pages being consecutive.
I assume the easiest way to achieve that is having a large-enough memory
hole (PCI hole?) that spans more than section, and memblock just giving
you the next PFN range to use as "struct page".
It's one allocation per memory section, see
sparse_init_nid()->__populate_section_memmap(prn, PAGES_PER_SECTION) ->
memmap_alloc()->memblock_alloc().
With memory hotplug, there might be other weird ways to achieve it I
suspect.
>
> So you can't have two consecutive 'struct page' things without them
> being consecutive pages.
>
> But by all means, if you want to make sure, just compare the page
> sections. But converting them to a PFN and then converting back is
> just crazy.
> > IOW, the logic would literally be something like (this assumes there
> is always at least *one* page):
>
> struct page *page = *pages++;
> int section = page_to_section(page);
>
> for (size_t nr = 1; nr < nr_pages; nr++) {
> if (*pages++ != ++page)
> break;
> if (page_to_section(page) != section)
> break;
> }
> return nr;
I think that would work, and we could limit the section check to the
problematic case only (sparsemem without VMEMMAP).
>
> and yes, I think we only define page_to_section() for
> SECTION_IN_PAGE_FLAGS, but we should fix that and just have a
>
> #define page_to_section(pg) 0
Probably yes, have to think about that.
>
> for the other cases, and the compiler will happily optimize away the
> "oh, it's always zero" case.
>
> So something like that should actually generate reasonable code. It
> *really* shouldn't try to generate a pfn (or, like that horror that I
> didn't pull did, then go *back* from pfn to page)
> > That 'nth_page()' thing is just crazy garbage.
Yes, it's unfortunate garbage we have to use even in folio_page().
>
> And even when fixed to not be garbage, I'm not convinced this needs to
> be in <linux/mm.h>.
Yeah, let's move it to mm/util.c if you agree.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists