[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9fe9f8c7-f59d-4a4b-9668-d3cd2c5a5fc9@redhat.com>
Date: Fri, 5 Sep 2025 13:38:32 +0200
From: David Hildenbrand <david@...hat.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: linux-kernel@...r.kernel.org, Alexander Potapenko <glider@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Brendan Jackman <jackmanb@...gle.com>, Christoph Lameter <cl@...two.org>,
Dennis Zhou <dennis@...nel.org>, Dmitry Vyukov <dvyukov@...gle.com>,
dri-devel@...ts.freedesktop.org, intel-gfx@...ts.freedesktop.org,
iommu@...ts.linux.dev, io-uring@...r.kernel.org,
Jason Gunthorpe <jgg@...dia.com>, Jens Axboe <axboe@...nel.dk>,
Johannes Weiner <hannes@...xchg.org>, John Hubbard <jhubbard@...dia.com>,
kasan-dev@...glegroups.com, kvm@...r.kernel.org,
"Liam R. Howlett" <Liam.Howlett@...cle.com>,
Linus Torvalds <torvalds@...ux-foundation.org>, linux-arm-kernel@...s.com,
linux-arm-kernel@...ts.infradead.org, linux-crypto@...r.kernel.org,
linux-ide@...r.kernel.org, linux-kselftest@...r.kernel.org,
linux-mips@...r.kernel.org, linux-mmc@...r.kernel.org, linux-mm@...ck.org,
linux-riscv@...ts.infradead.org, linux-s390@...r.kernel.org,
linux-scsi@...r.kernel.org, Marco Elver <elver@...gle.com>,
Marek Szyprowski <m.szyprowski@...sung.com>, Michal Hocko <mhocko@...e.com>,
Mike Rapoport <rppt@...nel.org>, Muchun Song <muchun.song@...ux.dev>,
netdev@...r.kernel.org, Oscar Salvador <osalvador@...e.de>,
Peter Xu <peterx@...hat.com>, Robin Murphy <robin.murphy@....com>,
Suren Baghdasaryan <surenb@...gle.com>, Tejun Heo <tj@...nel.org>,
virtualization@...ts.linux.dev, Vlastimil Babka <vbabka@...e.cz>,
wireguard@...ts.zx2c4.com, x86@...nel.org, Zi Yan <ziy@...dia.com>
Subject: Re: [PATCH v2 19/37] mm/gup: remove record_subpages()
On 05.09.25 13:34, Lorenzo Stoakes wrote:
> On Fri, Sep 05, 2025 at 08:41:23AM +0200, David Hildenbrand wrote:
>> On 01.09.25 17:03, David Hildenbrand wrote:
>>> We can just cleanup the code by calculating the #refs earlier,
>>> so we can just inline what remains of record_subpages().
>>>
>>> Calculate the number of references/pages ahead of times, and record them
>>> only once all our tests passed.
>>>
>>> Signed-off-by: David Hildenbrand <david@...hat.com>
>
> So strange I thought I looked at this...!
>
>>> ---
>>> mm/gup.c | 25 ++++++++-----------------
>>> 1 file changed, 8 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/mm/gup.c b/mm/gup.c
>>> index c10cd969c1a3b..f0f4d1a68e094 100644
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -484,19 +484,6 @@ static inline void mm_set_has_pinned_flag(struct mm_struct *mm)
>>> #ifdef CONFIG_MMU
>>> #ifdef CONFIG_HAVE_GUP_FAST
>>> -static int record_subpages(struct page *page, unsigned long sz,
>>> - unsigned long addr, unsigned long end,
>>> - struct page **pages)
>>> -{
>>> - int nr;
>>> -
>>> - page += (addr & (sz - 1)) >> PAGE_SHIFT;
>>> - for (nr = 0; addr != end; nr++, addr += PAGE_SIZE)
>>> - pages[nr] = page++;
>>> -
>>> - return nr;
>>> -}
>>> -
>>> /**
>>> * try_grab_folio_fast() - Attempt to get or pin a folio in fast path.
>>> * @page: pointer to page to be grabbed
>>> @@ -2967,8 +2954,8 @@ static int gup_fast_pmd_leaf(pmd_t orig, pmd_t *pmdp, unsigned long addr,
>>> if (pmd_special(orig))
>>> return 0;
>>> - page = pmd_page(orig);
>>> - refs = record_subpages(page, PMD_SIZE, addr, end, pages + *nr);
>>> + refs = (end - addr) >> PAGE_SHIFT;
>>> + page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
>>> folio = try_grab_folio_fast(page, refs, flags);
>>> if (!folio)
>>> @@ -2989,6 +2976,8 @@ static int gup_fast_pmd_leaf(pmd_t orig, pmd_t *pmdp, unsigned long addr,
>>> }
>>> *nr += refs;
>>> + for (; refs; refs--)
>>> + *(pages++) = page++;
>>> folio_set_referenced(folio);
>>> return 1;
>>> }
>>> @@ -3007,8 +2996,8 @@ static int gup_fast_pud_leaf(pud_t orig, pud_t *pudp, unsigned long addr,
>>> if (pud_special(orig))
>>> return 0;
>>> - page = pud_page(orig);
>>> - refs = record_subpages(page, PUD_SIZE, addr, end, pages + *nr);
>>> + refs = (end - addr) >> PAGE_SHIFT;
>>> + page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
>>> folio = try_grab_folio_fast(page, refs, flags);
>>> if (!folio)
>>> @@ -3030,6 +3019,8 @@ static int gup_fast_pud_leaf(pud_t orig, pud_t *pudp, unsigned long addr,
>>> }
>>> *nr += refs;
>>> + for (; refs; refs--)
>>> + *(pages++) = page++;
>>> folio_set_referenced(folio);
>>> return 1;
>>> }
>>
>> Okay, this code is nasty. We should rework this code to just return the nr and receive a the proper
>> pages pointer, getting rid of the "*nr" parameter.
>>
>> For the time being, the following should do the trick:
>>
>> commit bfd07c995814354f6b66c5b6a72e96a7aa9fb73b (HEAD -> nth_page)
>> Author: David Hildenbrand <david@...hat.com>
>> Date: Fri Sep 5 08:38:43 2025 +0200
>>
>> fixup: mm/gup: remove record_subpages()
>> pages is not adjusted by the caller, but idnexed by existing *nr.
>> Signed-off-by: David Hildenbrand <david@...hat.com>
>>
>> diff --git a/mm/gup.c b/mm/gup.c
>> index 010fe56f6e132..22420f2069ee1 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -2981,6 +2981,7 @@ static int gup_fast_pmd_leaf(pmd_t orig, pmd_t *pmdp, unsigned long addr,
>> return 0;
>> }
>> + pages += *nr;
>> *nr += refs;
>> for (; refs; refs--)
>> *(pages++) = page++;
>> @@ -3024,6 +3025,7 @@ static int gup_fast_pud_leaf(pud_t orig, pud_t *pudp, unsigned long addr,
>> return 0;
>> }
>> + pages += *nr;
>> *nr += refs;
>> for (; refs; refs--)
>> *(pages++) = page++;
>
> This looks correct.
>
> But.
>
> This is VERY nasty. Before we'd call record_subpages() with pages + *nr, where
> it was clear we were offsetting by this, now we're making things imo way more
> confusing.
>
> This makes me less in love with this approach to be honest.
>
> But perhaps it's the least worst thing for now until we can do a bigger
> refactor...
>
> So since this seems correct to me, and for the sake of moving things forward
> (was this one patch dropped from mm-new or does mm-new just have an old version?
> Confused):
>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
>
> For this patch obviously with the fix applied.
>
> But can we PLEASE revisit this :)
Yeah, I already asked someone internally if he would have time to do
some refactorings in mm/gup.c.
If that won't work out I shall do it at some point (and the same time
reworking follow_page_mask() to just consume the array as well like gup
does)
--
Cheers
David / dhildenb
Powered by blists - more mailing lists