[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5cc05529-eb80-410e-bc26-233b0ba0b21f@redhat.com>
Date: Thu, 7 Mar 2024 09:32:11 +0100
From: David Hildenbrand <david@...hat.com>
To: Barry Song <21cnbao@...il.com>, Lance Yang <ioworker0@...il.com>,
Vishal Moola <vishal.moola@...il.com>
Cc: akpm@...ux-foundation.org, zokeefe@...gle.com, ryan.roberts@....com,
shy828301@...il.com, mhocko@...e.com, fengwei.yin@...el.com,
xiehuan09@...il.com, wangkefeng.wang@...wei.com, songmuchun@...edance.com,
peterx@...hat.com, minchan@...nel.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/1] mm/madvise: enhance lazyfreeing with mTHP in
madvise_free
On 07.03.24 09:10, Barry Song wrote:
> On Thu, Mar 7, 2024 at 9:00 PM Lance Yang <ioworker0@...il.com> wrote:
>>
>> Hey Barry,
>>
>> Thanks for taking time to review!
>>
>> On Thu, Mar 7, 2024 at 3:00 PM Barry Song <21cnbao@...il.com> wrote:
>>>
>>> On Thu, Mar 7, 2024 at 7:15 PM Lance Yang <ioworker0@...il.com> wrote:
>>>>
>> [...]
>>>> +static inline bool can_mark_large_folio_lazyfree(unsigned long addr,
>>>> + struct folio *folio, pte_t *start_pte)
>>>> +{
>>>> + int nr_pages = folio_nr_pages(folio);
>>>> + fpb_t flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY;
>>>> +
>>>> + for (int i = 0; i < nr_pages; i++)
>>>> + if (page_mapcount(folio_page(folio, i)) != 1)
No new direct subpage mapcount users please if avoidable. Also, without
holding the page lock this might be racy (did not stare at the bigger code).
>>>> + return false;
>>>
>>> we have moved to folio_estimated_sharers though it is not precise, so
>>> we don't do
>>> this check with lots of loops and depending on the subpage's mapcount.
>>
>> If we don't check the subpage’s mapcount, and there is a cow folio associated
>> with this folio and the cow folio has smaller size than this folio,
>> should we still
>> mark this folio as lazyfree?
>
> I agree, this is true. However, we've somehow accepted the fact that
> folio_likely_mapped_shared
> can result in false negatives or false positives to balance the
> overhead. So I really don't know :-)
>
> Maybe David and Vishal can give some comments here.
Well, I did not accept the fact yet that folio_likely_mapped_shared()
can give false negatives :)
So I've been working on a replacement [1], also in the context of
removing the subpage mapcounts. But there is still a lot to resolve
around that, first step being to introduce the total mapcount.
But independent of that, MADV_FREE is special (compared to
MADV_DONTNEED), because we really have to make sure all page table
mappings that map the folio will be covered here.
What could work for large folios is making sure that #ptes that map the
folio here correspond to the folio_mapcount(). And folio_mapcount()
should be called under folio lock, to avoid racing with swapout/migration.
So instead of checking each page_mapcount(), see how many PTEs you could
batch, try taking the folio lock, and compare the number of batched PTEs
against folio_mapcount(). If they match, we're good. if not, there is
some other folio mapping somewhere else (other process, mremap).
folio_likely_mapped_shared() should still be used as an early exit point.
[1]
https://lore.kernel.org/all/20231124132626.235350-1-david@redhat.com/T/#u
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists