linux-kernel - Re: [RFC PATCH v3 3/5] mm: swap: make should_try_to_free

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f880135f-e113-4d42-b3a0-8b0b9eebcbf4@arm.com>
Date: Wed, 13 Mar 2024 09:09:59 +0000
From: Ryan Roberts <ryan.roberts@....com>
To: Chuanhua Han <hanchuanhua@...o.com>, Barry Song <21cnbao@...il.com>,
 akpm@...ux-foundation.org, linux-mm@...ck.org
Cc: chengming.zhou@...ux.dev, chrisl@...nel.org, david@...hat.com,
 hannes@...xchg.org, kasong@...cent.com,
 linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
 mhocko@...e.com, nphamcs@...il.com, shy828301@...il.com,
 steven.price@....com, surenb@...gle.com, wangkefeng.wang@...wei.com,
 willy@...radead.org, xiang@...nel.org, ying.huang@...el.com,
 yosryahmed@...gle.com, yuzhao@...gle.com, Barry Song <v-songbaohua@...o.com>
Subject: Re: [RFC PATCH v3 3/5] mm: swap: make should_try_to_free_swap()
 support large-folio

On 13/03/2024 02:21, Chuanhua Han wrote:
> hi, Ryan Roberts
> 
> 在 2024/3/12 20:34, Ryan Roberts 写道:
>> On 04/03/2024 08:13, Barry Song wrote:
>>> From: Chuanhua Han <hanchuanhua@...o.com>
>>>
>>> should_try_to_free_swap() works with an assumption that swap-in is always done
>>> at normal page granularity, aka, folio_nr_pages = 1. To support large folio
>>> swap-in, this patch removes the assumption.
>>>
>>> Signed-off-by: Chuanhua Han <hanchuanhua@...o.com>
>>> Co-developed-by: Barry Song <v-songbaohua@...o.com>
>>> Signed-off-by: Barry Song <v-songbaohua@...o.com>
>>> Acked-by: Chris Li <chrisl@...nel.org>
>>> ---
>>>  mm/memory.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index abd4f33d62c9..e0d34d705e07 100644
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -3837,7 +3837,7 @@ static inline bool should_try_to_free_swap(struct folio *folio,
>>>  	 * reference only in case it's likely that we'll be the exlusive user.
>>>  	 */
>>>  	return (fault_flags & FAULT_FLAG_WRITE) && !folio_test_ksm(folio) &&
>>> -		folio_ref_count(folio) == 2;
>>> +		folio_ref_count(folio) == (1 + folio_nr_pages(folio));
>> I don't think this is correct; one reference has just been added to the folio in
>> do_swap_page(), either by getting from swapcache (swap_cache_get_folio()) or by
>> allocating. If it came from the swapcache, it could be a large folio, because we
>> swapped out a large folio and never removed it from swapcache. But in that case,
>> others may have partially mapped it, so the refcount could legitimately equal
>> the number of pages while still not being exclusively mapped.
>>
>> I'm guessing this logic is trying to estimate when we are likely exclusive so
>> that we remove from swapcache (release ref) and can then reuse rather than CoW
>> the folio? The main CoW path currently CoWs page-by-page even for large folios,
>> and with Barry's recent patch, even the last page gets copied. So not sure what
>> this change is really trying to achieve?
>>
> First, if it is a large folio in the swap cache, then its refcont is at
> least folio_nr_pages(folio) :  

Ahh! Sorry, I had it backwards - was thinking there would be 1 ref for the swap
cache, and you were assuming 1 ref per page taken by do_swap_page(). I
understand now. On this basis:

Reviewed-by: Ryan Roberts <ryan.roberts@....com>

> 
> 
> For example, in add_to_swap_cache path:
> 
> int add_to_swap_cache(struct folio *folio, swp_entry_t entry,
>                         gfp_t gfp, void **shadowp)
> {
>         struct address_space *address_space = swap_address_space(entry);
>         pgoff_t idx = swp_offset(entry);
>         XA_STATE_ORDER(xas, &address_space->i_pages, idx,
> folio_order(folio));
>         unsigned long i, nr = folio_nr_pages(folio); <---
>         void *old;
>         ...
>         folio_ref_add(folio, nr); <---
>         folio_set_swapcache(folio);
>         ...
> }
> 
> 
>   *
> 
>     Then in the do_swap_page path:
> 
>   * if (should_try_to_free_swap(folio, vma, vmf->flags))
>             folio_free_swap(folio);
>   *
> 
>   * It also indicates that only folio in the swap cache will call
>     folio_free_swap
>   * to delete it from the swap cache, So I feel like this patch is
>     necessary!? 😁
> 
>>>  }
>>>  
>>>  static vm_fault_t pte_marker_clear(struct vm_fault *vmf)
> 
> Thanks,
> 
> Chuanhua
>