[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHbLzkpRgeYkPHUc3KAUc_Fr-YexQxK1cH92Suueac5GrwZHsw@mail.gmail.com>
Date: Thu, 27 Jan 2022 13:23:41 -0800
From: Yang Shi <shy828301@...il.com>
To: David Hildenbrand <david@...hat.com>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Hugh Dickins <hughd@...gle.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
David Rientjes <rientjes@...gle.com>,
Shakeel Butt <shakeelb@...gle.com>,
John Hubbard <jhubbard@...dia.com>,
Jason Gunthorpe <jgg@...dia.com>,
Mike Kravetz <mike.kravetz@...cle.com>,
Mike Rapoport <rppt@...ux.ibm.com>,
"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
Matthew Wilcox <willy@...radead.org>,
Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
Michal Hocko <mhocko@...nel.org>,
Nadav Amit <namit@...are.com>, Rik van Riel <riel@...riel.com>,
Roman Gushchin <guro@...com>,
Andrea Arcangeli <aarcange@...hat.com>,
Peter Xu <peterx@...hat.com>,
Donald Dutile <ddutile@...hat.com>,
Christoph Hellwig <hch@....de>,
Oleg Nesterov <oleg@...hat.com>, Jan Kara <jack@...e.cz>,
Liang Zhang <zhangliang5@...wei.com>,
Linux MM <linux-mm@...ck.org>
Subject: Re: [PATCH RFC v2 6/9] mm/khugepaged: remove reuse_swap_page() usage
On Wed, Jan 26, 2022 at 2:00 AM David Hildenbrand <david@...hat.com> wrote:
>
> reuse_swap_page() currently indicates if we can write to an anon page
> without COW. A COW is required if the page is shared by multiple
> processes (either already mapped or via swap entries) or if there is
> concurrent writeback that cannot tolerate concurrent page modifications.
>
> reuse_swap_page() doesn't check for pending references from other
> processes that already unmapped the page, however,
> is_refcount_suitable() essentially does the same thing in the context of
> khugepaged. khugepaged is the last remaining user of reuse_swap_page() and
> we want to remove that function.
>
> In the context of khugepaged, we are not actually going to write to the
> page and we don't really care about other processes mapping the page:
> for example, without swap, we don't care about shared pages at all.
>
> The current logic seems to be:
> * Writable: -> Not shared, but might be in the swapcache. Nobody can
> fault it in from the swapcache as there are no other swap entries.
> * Readable and not in the swapcache: Might be shared (but nobody can
> fault it in from the swapcache).
> * Readable and in the swapcache: Might be shared and someone might be
> able to fault it in from the swapcache. Make sure we're the exclusive
> owner via reuse_swap_page().
>
> Having to guess due to lack of comments and documentation, the current
> logic really only wants to make sure that a page that might be shared
> cannot be faulted in from the swapcache while khugepaged is active.
> It's hard to guess why that is that case and if it's really still required,
> but let's try keeping that logic unmodified.
I don't think it could be faulted in while khugepaged is active since
khugepaged does hold mmap_lock in write mode IIUC. So page fault is
serialized against khugepaged.
My wild guess is that collapsing shared pages was not supported before
v5.8, so we need reuse_swap_page() to tell us if the page in swap
cache is shared or not. But it is not true anymore. And khugepaged
just allocates a THP then copy the data from base pages to huge page
then replace PTEs to PMD, it doesn't change the content of the page,
so I failed to see a problem by collapsing a shared page in swap
cache. But I'm really not entirely sure, I may miss something...
>
> Instead of relying on reuse_swap_page(), let's unconditionally
> try_to_free_swap(), special casing PageKsm(). try_to_free_swap() will fail
> if there are still swap entries targeting the page or if the page is under
> writeback.
>
> After a successful try_to_free_swap() that page cannot be readded to the
> swapcache because we're keeping the page locked and removed from the LRU
> until we actually perform the copy. So once we succeeded removing a page
> from the swapcache, it cannot be re-added until we're done copying. Add a
> comment stating that.
>
> Signed-off-by: David Hildenbrand <david@...hat.com>
> ---
> mm/khugepaged.c | 16 +++++++++++++---
> 1 file changed, 13 insertions(+), 3 deletions(-)
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 35f14d0a00a6..bc0ff598e98f 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -683,10 +683,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> goto out;
> }
> if (!pte_write(pteval) && PageSwapCache(page) &&
> - !reuse_swap_page(page)) {
> + (PageKsm(page) || !try_to_free_swap(page))) {
> /*
> - * Page is in the swap cache and cannot be re-used.
> - * It cannot be collapsed into a THP.
> + * Possibly shared page cannot be removed from the
> + * swapache. It cannot be collapsed into a THP.
> */
> unlock_page(page);
> result = SCAN_SWAP_CACHE_PAGE;
> @@ -702,6 +702,16 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> result = SCAN_DEL_PAGE_LRU;
> goto out;
> }
> +
> + /*
> + * We're holding the page lock and removed the page from the
> + * LRU. Once done copying, we'll unlock and readd to the
> + * LRU via release_pte_page(). If the page is still in the
> + * swapcache, we're the exclusive owner. Due to the page lock
> + * the page cannot be added to the swapcache until we're done
> + * and consequently it cannot be faulted in from the swapcache
> + * into another process.
> + */
> mod_node_page_state(page_pgdat(page),
> NR_ISOLATED_ANON + page_is_file_lru(page),
> compound_nr(page));
> --
> 2.34.1
>
Powered by blists - more mailing lists