lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5cc05529-eb80-410e-bc26-233b0ba0b21f@redhat.com>
Date: Thu, 7 Mar 2024 09:32:11 +0100
From: David Hildenbrand <david@...hat.com>
To: Barry Song <21cnbao@...il.com>, Lance Yang <ioworker0@...il.com>,
 Vishal Moola <vishal.moola@...il.com>
Cc: akpm@...ux-foundation.org, zokeefe@...gle.com, ryan.roberts@....com,
 shy828301@...il.com, mhocko@...e.com, fengwei.yin@...el.com,
 xiehuan09@...il.com, wangkefeng.wang@...wei.com, songmuchun@...edance.com,
 peterx@...hat.com, minchan@...nel.org, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/1] mm/madvise: enhance lazyfreeing with mTHP in
 madvise_free

On 07.03.24 09:10, Barry Song wrote:
> On Thu, Mar 7, 2024 at 9:00 PM Lance Yang <ioworker0@...il.com> wrote:
>>
>> Hey Barry,
>>
>> Thanks for taking time to review!
>>
>> On Thu, Mar 7, 2024 at 3:00 PM Barry Song <21cnbao@...il.com> wrote:
>>>
>>> On Thu, Mar 7, 2024 at 7:15 PM Lance Yang <ioworker0@...il.com> wrote:
>>>>
>> [...]
>>>> +static inline bool can_mark_large_folio_lazyfree(unsigned long addr,
>>>> +                                                struct folio *folio, pte_t *start_pte)
>>>> +{
>>>> +       int nr_pages = folio_nr_pages(folio);
>>>> +       fpb_t flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY;
>>>> +
>>>> +       for (int i = 0; i < nr_pages; i++)
>>>> +               if (page_mapcount(folio_page(folio, i)) != 1)

No new direct subpage mapcount users please if avoidable. Also, without 
holding the page lock this might be racy (did not stare at the bigger code).

>>>> +                       return false;
>>>
>>> we have moved to folio_estimated_sharers though it is not precise, so
>>> we don't do
>>> this check with lots of loops and depending on the subpage's mapcount.
>>
>> If we don't check the subpage’s mapcount, and there is a cow folio associated
>> with this folio and the cow folio has smaller size than this folio,
>> should we still
>> mark this folio as lazyfree?
> 
> I agree, this is true. However, we've somehow accepted the fact that
> folio_likely_mapped_shared
> can result in false negatives or false positives to balance the
> overhead.  So I really don't know :-)
> 
> Maybe David and Vishal can give some comments here.

Well, I did not accept the fact yet that folio_likely_mapped_shared() 
can give false negatives :)

So I've been working on a replacement [1], also in the context of 
removing the subpage mapcounts. But there is still a lot to resolve 
around that, first step being to introduce the total mapcount.

But independent of that, MADV_FREE is special (compared to 
MADV_DONTNEED), because we really have to make sure all page table 
mappings that map the folio will be covered here.

What could work for large folios is making sure that #ptes that map the 
folio here correspond to the folio_mapcount(). And folio_mapcount() 
should be called under folio lock, to avoid racing with swapout/migration.

So instead of checking each page_mapcount(), see how many PTEs you could 
batch, try taking the folio lock, and compare the number of batched PTEs 
against folio_mapcount(). If they match, we're good. if not, there is 
some other folio mapping somewhere else (other process, mremap).

folio_likely_mapped_shared() should still be used as an early exit point.

[1] 
https://lore.kernel.org/all/20231124132626.235350-1-david@redhat.com/T/#u

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ