lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <17d610f7-4d85-e9b4-6429-4ad89274cb48@huawei.com>
Date:   Fri, 17 Jun 2022 10:26:58 +0800
From:   Miaohe Lin <linmiaohe@...wei.com>
To:     Yang Shi <shy828301@...il.com>
CC:     Andrew Morton <akpm@...ux-foundation.org>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Matthew Wilcox <willy@...radead.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        David Howells <dhowells@...hat.com>, NeilBrown <neilb@...e.de>,
        Alistair Popple <apopple@...dia.com>,
        David Hildenbrand <david@...hat.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Peter Xu <peterx@...hat.com>, Linux MM <linux-mm@...ck.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 7/7] mm/khugepaged: try to free transhuge swapcache when
 possible

On 2022/6/16 23:53, Yang Shi wrote:
> On Thu, Jun 16, 2022 at 12:42 AM Miaohe Lin <linmiaohe@...wei.com> wrote:
>>
>> On 2022/6/16 7:58, Yang Shi wrote:
>>> On Sat, Jun 11, 2022 at 1:47 AM Miaohe Lin <linmiaohe@...wei.com> wrote:
>>>>
>>>> Transhuge swapcaches won't be freed in __collapse_huge_page_copy().
>>>> It's because release_pte_page() is not called for these pages and
>>>> thus free_page_and_swap_cache can't grab the page lock. These pages
>>>> won't be freed from swap cache even if we are the only user until
>>>> next time reclaim. It shouldn't hurt indeed, but we could try to
>>>> free these pages to save more memory for system.
>>>
>>>
>>>>
>>>> Signed-off-by: Miaohe Lin <linmiaohe@...wei.com>
>>>> ---
>>>>  include/linux/swap.h | 5 +++++
>>>>  mm/khugepaged.c      | 1 +
>>>>  mm/swap.h            | 5 -----
>>>>  3 files changed, 6 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>>>> index 8672a7123ccd..ccb83b12b724 100644
>>>> --- a/include/linux/swap.h
>>>> +++ b/include/linux/swap.h
>>>> @@ -456,6 +456,7 @@ static inline unsigned long total_swapcache_pages(void)
>>>>         return global_node_page_state(NR_SWAPCACHE);
>>>>  }
>>>>
>>>> +extern void free_swap_cache(struct page *page);
>>>>  extern void free_page_and_swap_cache(struct page *);
>>>>  extern void free_pages_and_swap_cache(struct page **, int);
>>>>  /* linux/mm/swapfile.c */
>>>> @@ -540,6 +541,10 @@ static inline void put_swap_device(struct swap_info_struct *si)
>>>>  /* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */
>>>>  #define free_swap_and_cache(e) is_pfn_swap_entry(e)
>>>>
>>>> +static inline void free_swap_cache(struct page *page)
>>>> +{
>>>> +}
>>>> +
>>>>  static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask)
>>>>  {
>>>>         return 0;
>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>> index ee0a719c8be9..52109ad13f78 100644
>>>> --- a/mm/khugepaged.c
>>>> +++ b/mm/khugepaged.c
>>>> @@ -756,6 +756,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
>>>>         list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) {
>>>>                 list_del(&src_page->lru);
>>>>                 release_pte_page(src_page);
>>>> +               free_swap_cache(src_page);
>>>
>>> Will this really work? The free_swap_cache() will just dec refcounts
>>> without putting the page back to buddy. So the hugepage is not
>>> actually freed at all. Am I missing something?
>>
>> Thanks for catching this! If page is on percpu lru_pvecs cache, page will
>> be released when lru_pvecs are drained. But if not, free_swap_cache() won't
>> free the page as it assumes the caller has a reference on the page and thus
>> only does page_ref_sub(). Does the below change looks sense for you?
> 
> THP gets drained immediately so they won't stay in pagevecs.

Yes, you're right. I missed this.

> 
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 52109ad13f78..b8c96e33591d 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -755,8 +755,12 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
>>
>>         list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) {
>>                 list_del(&src_page->lru);
>> -               release_pte_page(src_page);
>> +               mod_node_page_state(page_pgdat(src_page),
>> +                                   NR_ISOLATED_ANON + page_is_file_lru(src_page),
>> +                                   -compound_nr(src_page));
>> +               unlock_page(src_page);
>>                 free_swap_cache(src_page);
>> +               putback_lru_page(src_page);
> 
> I'm not sure if it is worth it or not for a rare corner case since THP
> should not stay in swapcache unless try_to_unmap() in vmscan fails

IIUC, even if try_to_unmap() in vmscan succeeds, THP might be still in the
swapcache if shrink_page_list is not called for this THP again after writeback
is done, e.g. when shrink_page_list is called from madvise, so there might be
no memory pressure, or do_swap_page puts the THP into page table again. Also THP
might not be splited when deferred_split_shrinker is not called, e.g. due to
not lacking of memory. Even if there is memory pressure, the THP will stay in
swapcache until next round page reclaim for this THP is done. So there should
be a non-negligible window that THP will stay in the swapcache.
Or am I miss something?

> IIUC. And it is not guaranteed that free_swap_cache() will get the
> page lock.

IMHO, we're not guaranteed that free_swap_cache() will get the page lock for the normal
page anyway.

Thanks!

> 
>>         }
>>  }
>>
>> Thanks!
>>
>>>
>>>>         }
>>>>  }
>>>>
>>>> diff --git a/mm/swap.h b/mm/swap.h
>>>> index 0193797b0c92..863f6086c916 100644
>>>> --- a/mm/swap.h
>>>> +++ b/mm/swap.h
>>>> @@ -41,7 +41,6 @@ void __delete_from_swap_cache(struct page *page,
>>>>  void delete_from_swap_cache(struct page *page);
>>>>  void clear_shadow_from_swap_cache(int type, unsigned long begin,
>>>>                                   unsigned long end);
>>>> -void free_swap_cache(struct page *page);
>>>>  struct page *lookup_swap_cache(swp_entry_t entry,
>>>>                                struct vm_area_struct *vma,
>>>>                                unsigned long addr);
>>>> @@ -81,10 +80,6 @@ static inline struct address_space *swap_address_space(swp_entry_t entry)
>>>>         return NULL;
>>>>  }
>>>>
>>>> -static inline void free_swap_cache(struct page *page)
>>>> -{
>>>> -}
>>>> -
>>>>  static inline void show_swap_cache_info(void)
>>>>  {
>>>>  }
>>>> --
>>>> 2.23.0
>>>>
>>>>
>>> .
>>>
>>
> .
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ