[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6a754628-d9d8-43e1-a6d9-fa45a5f23971@redhat.com>
Date: Wed, 25 Jun 2025 15:02:03 +0200
From: David Hildenbrand <david@...hat.com>
To: Lance Yang <lance.yang@...ux.dev>, Barry Song <21cnbao@...il.com>
Cc: akpm@...ux-foundation.org, baolin.wang@...ux.alibaba.com,
chrisl@...nel.org, kasong@...cent.com, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
linux-riscv@...ts.infradead.org, lorenzo.stoakes@...cle.com,
ryan.roberts@....com, v-songbaohua@...o.com, x86@...nel.org,
ying.huang@...el.com, zhengtangquan@...o.com,
Lance Yang <ioworker0@...il.com>
Subject: Re: [PATCH v4 3/4] mm: Support batched unmap for lazyfree large
folios during reclamation
On 25.06.25 14:58, Lance Yang wrote:
>
>
> On 2025/6/25 20:09, David Hildenbrand wrote:
>> On 25.06.25 13:42, Barry Song wrote:
>>> On Wed, Jun 25, 2025 at 11:27 PM David Hildenbrand <david@...hat.com>
>>> wrote:
>>>>
>>>> On 25.06.25 13:15, Barry Song wrote:
>>>>> On Wed, Jun 25, 2025 at 11:01 PM David Hildenbrand
>>>>> <david@...hat.com> wrote:
>>>>>>
>>>>>> On 25.06.25 12:57, Barry Song wrote:
>>>>>>>>>
>>>>>>>>> Note that I don't quite understand why we have to batch the
>>>>>>>>> whole thing
>>>>>>>>> or fallback to
>>>>>>>>> individual pages. Why can't we perform other batches that span
>>>>>>>>> only some
>>>>>>>>> PTEs? What's special
>>>>>>>>> about 1 PTE vs. 2 PTEs vs. all PTEs?
>>>>>>>>
>>>>>>>> That's a good point about the "all-or-nothing" batching logic ;)
>>>>>>>>
>>>>>>>> It seems the "all-or-nothing" approach is specific to the
>>>>>>>> lazyfree use
>>>>>>>> case, which needs to unmap the entire folio for reclamation. If
>>>>>>>> that's
>>>>>>>> not possible, it falls back to the single-page slow path.
>>>>>>>
>>>>>>> Other cases advance the PTE themselves, while try_to_unmap_one()
>>>>>>> relies
>>>>>>> on page_vma_mapped_walk() to advance the PTE. Unless we want to
>>>>>>> manually
>>>>>>> modify pvmw.pte and pvmw.address outside of
>>>>>>> page_vma_mapped_walk(), which
>>>>>>> to me seems like a violation of layers. :-)
>>>>>>
>>>>>> Please explain to me why the following is not clearer and better:
>>>>>
>>>>> This part is much clearer, but that doesn’t necessarily improve the
>>>>> overall
>>>>> picture. The main challenge is how to exit the iteration of
>>>>> while (page_vma_mapped_walk(&pvmw)).
>>>>
>>>> Okay, I get what you mean now.
>>>>
>>>>>
>>>>> Right now, we have it laid out quite straightforwardly:
>>>>> /* We have already batched the entire folio */
>>>>> if (nr_pages > 1)
>>>>> goto walk_done;
>>>>
>>>>
>>>> Given that the comment is completely confusing whens seeing the
>>>> check ... :)
>>>>
>>>> /*
>>>> * If we are sure that we batched the entire folio and cleared all
>>>> PTEs,
>>>> * we can just optimize and stop right here.
>>>> */
>>>> if (nr_pages == folio_nr_pages(folio))
>>>> goto walk_done;
>>>>
>>>> would make the comment match.
>>>
>>> Yes, that clarifies it.
>>>
>>>>
>>>>>
>>>>> with any nr between 1 and folio_nr_pages(), we have to consider two
>>>>> issues:
>>>>> 1. How to skip PTE checks inside page_vma_mapped_walk for entries that
>>>>> were already handled in the previous batch;
>>>>
>>>> They are cleared if we reach that point. So the pte_none() checks will
>>>> simply skip them?
>>>>
>>>>> 2. How to break the iteration when this batch has arrived at the end.
>>>>
>>>> page_vma_mapped_walk() should be doing that?
>>>
>>> It seems you might have missed the part in my reply that says:
>>> "Of course, we could avoid both, but that would mean performing
>>> unnecessary
>>> checks inside page_vma_mapped_walk()."
>> > > That’s true for both. But I’m wondering why we’re still doing the
>> check,
>>> even when we’re fairly sure they’ve already been cleared or we’ve reached
>>> the end :-)
>>
>> :)
>>
>>>
>>> Somehow, I feel we could combine your cleanup code—which handles a batch
>>> size of "nr" between 1 and nr_pages—with the
>>> "if (nr_pages == folio_nr_pages(folio)) goto walk_done" check.
>>
>> Yeah, that's what I was suggesting. It would have to be part of the
>> cleanup I think.
>>
>> I'm still wondering if there is a case where
>>
>> if (nr_pages == folio_nr_pages(folio))
>> goto walk_done;
>>
>> would be wrong when dealing with small folios.
>
> We can make the check more explicit to avoid any future trouble ;)
>
> if (nr_pages > 1 && nr_pages == folio_nr_pages(folio))
> goto walk_done;
>
> It should be safe for small folios.
I mean, we are walking a single VMA, and a small folio should never be
mapped multiple times into the same VMA. (ignoring KSM, but KSM is
handled differently either way)
So we should be good, just wanted to raise it.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists