linux-kernel - Re: [PATCH v5 1/2] mm/madvise: optimize lazyfreeing with mTHP in madvise

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8d674b15-ef74-4d96-bc27-8794f744517c@arm.com>
Date: Thu, 11 Apr 2024 15:39:28 +0100
From: Ryan Roberts <ryan.roberts@....com>
To: Lance Yang <ioworker0@...il.com>
Cc: akpm@...ux-foundation.org, david@...hat.com, 21cnbao@...il.com,
 mhocko@...e.com, fengwei.yin@...el.com, zokeefe@...gle.com,
 shy828301@...il.com, xiehuan09@...il.com, wangkefeng.wang@...wei.com,
 songmuchun@...edance.com, peterx@...hat.com, minchan@...nel.org,
 linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 1/2] mm/madvise: optimize lazyfreeing with mTHP in
 madvise_free

On 11/04/2024 15:07, Lance Yang wrote:
> On Thu, Apr 11, 2024 at 9:48 PM Ryan Roberts <ryan.roberts@....com> wrote:
>>
>> [...]
>>
>>>>> +
>>>>> +             if (!folio_trylock(folio))
>>>>> +                     continue;
>>>>
>>>> This is still wrong. This should all be protected by the "if
>>>> (folio_test_swapcache(folio) || folio_test_dirty(folio))" as it was previously
>>>> so that you only call folio_trylock() if that condition is true. You are
>>>> unconditionally locking here, then unlocking, then relocking below if the
>>>> condition is met. Just put everything inside the condition and lock once.
>>>
>>> I'm not sure if it's safe to call folio_mapcount() without holding the
>>> folio lock.
>>>
>>> As mentioned earlier by David in the v2[1]
>>>> What could work for large folios is making sure that #ptes that map the
>>>> folio here correspond to the folio_mapcount(). And folio_mapcount()
>>>> should be called under folio lock, to avoid racing with swapout/migration.
>>>
>>> [1] https://lore.kernel.org/all/5cc05529-eb80-410e-bc26-233b0ba0b21f@redhat.com/
>>
>> But I'm not suggesting that you should call folio_mapcount() without the lock.
>> I'm proposing this:
>>
>>                 if (folio_test_swapcache(folio) || folio_test_dirty(folio)) {
>>                         if (!folio_trylock(folio))
>>                                 continue;
>>                         /*
>> -                        * If folio is shared with others, we mustn't clear
>> -                        * the folio's dirty flag.
>> +                        * If we have a large folio at this point, we know it is
>> +                        * fully mapped so if its mapcount is the same as its
>> +                        * number of pages, it must be exclusive.
>>                          */
>> -                       if (folio_mapcount(folio) != 1) {
>> +                       if (folio_mapcount(folio) != folio_nr_pages(folio)) {
>>                                 folio_unlock(folio);
>>                                 continue;
>>                         }
> 
> IIUC, if the folio is clean and not in the swapcache, we still need to
> compare the number of batched PTEs against folio_mapcount().

Why? That's not how the old code worked. In fact the comment says that the
reason for the exclusive check is to avoid marking a dirty *folio* as clean if
shared; that would be bad because we could throw away data that others relied
upon. It's perfectly safe to clear the dirty flag from the *pte* even if it is
shared; the ptes are private to the process so that won't affect sharers.

You should just follow the pattern already estabilished by the original code.
The only difference is that because the folio is now (potentially) large, you
have to change the way to detect exclusivity.

> 
> Thanks,
> Lance
> 
>>
>> What am I missing?
>>