linux-kernel - Re: [PATCH v2 2/5] mm: Allow deferred splitting of arbitrary large anon folios

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <878rbof8cs.fsf@yhuang6-desk2.ccr.corp.intel.com>
Date:   Mon, 10 Jul 2023 17:01:39 +0800
From:   "Huang, Ying" <ying.huang@...el.com>
To:     Ryan Roberts <ryan.roberts@....com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Matthew Wilcox <willy@...radead.org>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Yin Fengwei <fengwei.yin@...el.com>,
        "David Hildenbrand" <david@...hat.com>,
        Yu Zhao <yuzhao@...gle.com>,
        "Catalin Marinas" <catalin.marinas@....com>,
        Will Deacon <will@...nel.org>,
        "Anshuman Khandual" <anshuman.khandual@....com>,
        Yang Shi <shy828301@...il.com>,
        <linux-arm-kernel@...ts.infradead.org>,
        <linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>
Subject: Re: [PATCH v2 2/5] mm: Allow deferred splitting of arbitrary large
 anon folios

Ryan Roberts <ryan.roberts@....com> writes:

> On 10/07/2023 06:37, Huang, Ying wrote:
>> Ryan Roberts <ryan.roberts@....com> writes:
>> 
>>> Somehow I managed to reply only to the linux-arm-kernel list on first attempt so
>>> resending:
>>>
>>> On 07/07/2023 09:21, Huang, Ying wrote:
>>>> Ryan Roberts <ryan.roberts@....com> writes:
>>>>
>>>>> With the introduction of large folios for anonymous memory, we would
>>>>> like to be able to split them when they have unmapped subpages, in order
>>>>> to free those unused pages under memory pressure. So remove the
>>>>> artificial requirement that the large folio needed to be at least
>>>>> PMD-sized.
>>>>>
>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@....com>
>>>>> Reviewed-by: Yu Zhao <yuzhao@...gle.com>
>>>>> Reviewed-by: Yin Fengwei <fengwei.yin@...el.com>
>>>>> ---
>>>>>  mm/rmap.c | 2 +-
>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>>>> index 82ef5ba363d1..bbcb2308a1c5 100644
>>>>> --- a/mm/rmap.c
>>>>> +++ b/mm/rmap.c
>>>>> @@ -1474,7 +1474,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
>>>>>  		 * page of the folio is unmapped and at least one page
>>>>>  		 * is still mapped.
>>>>>  		 */
>>>>> -		if (folio_test_pmd_mappable(folio) && folio_test_anon(folio))
>>>>> +		if (folio_test_large(folio) && folio_test_anon(folio))
>>>>>  			if (!compound || nr < nr_pmdmapped)
>>>>>  				deferred_split_folio(folio);
>>>>>  	}
>>>>
>>>> One possible issue is that even for large folios mapped only in one
>>>> process, in zap_pte_range(), we will always call deferred_split_folio()
>>>> unnecessarily before freeing a large folio.
>>>
>>> Hi Huang, thanks for reviewing!
>>>
>>> I have a patch that solves this problem by determining a range of ptes covered
>>> by a single folio and doing a "batch zap". This prevents the need to add the
>>> folio to the deferred split queue, only to remove it again shortly afterwards.
>>> This reduces lock contention and I can measure a performance improvement for the
>>> kernel compilation benchmark. See [1].
>>>
>>> However, I decided to remove it from this patch set on Yu Zhao's advice. We are
>>> aiming for the minimal patch set to start with and wanted to focus people on
>>> that. I intend to submit it separately later on.
>>>
>>> [1] https://lore.kernel.org/linux-mm/20230626171430.3167004-8-ryan.roberts@arm.com/
>> 
>> Thanks for your information!  "batch zap" can solve the problem.
>> 
>> And, I agree with Matthew's comments to fix the large folios interaction
>> issues before merging the patches to allocate large folios as in the
>> following email.
>> 
>> https://lore.kernel.org/linux-mm/ZKVdUDuwNWDUCWc5@casper.infradead.org/
>> 
>> If so, we don't need to introduce the above problem or a large patchset.
>
> I appreciate Matthew's and others position about not wanting to merge a minimal
> implementation while there are some fundamental features (e.g. compaction) it
> doesn't play well with - I'm working to create a definitive list so these items
> can be tracked and tackled.

Good to know this, Thanks!

> That said, I don't see this "batch zap" patch as an example of this. It's just a
> performance enhancement that improves things even further than large anon folios
> on their own. I'd rather concentrate on the core changes first then deal with
> this type of thing later. Does that work for you?

IIUC, allocating large folios upon page fault depends on splitting large
folios in page_remove_rmap() to avoid memory wastage.  Splitting large
folios in page_remove_rmap() depends on "batch zap" to avoid performance
regression in zap_pte_range().  So we need them to be done earlier.  Or
I miss something?

Best Regards,
Huang, Ying