[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <da98141f-c8d8-f796-d8ec-352d9b2abd6a@nvidia.com>
Date: Wed, 16 Feb 2022 19:15:30 -0800
From: John Hubbard <jhubbard@...dia.com>
To: Peter Xu <peterx@...hat.com>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org
Cc: Andrew Morton <akpm@...ux-foundation.org>,
"Kirill A . Shutemov" <kirill@...temov.name>,
Matthew Wilcox <willy@...radead.org>,
Yang Shi <shy828301@...il.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Alistair Popple <apopple@...dia.com>,
David Hildenbrand <david@...hat.com>,
Vlastimil Babka <vbabka@...e.cz>,
Hugh Dickins <hughd@...gle.com>
Subject: Re: [PATCH v4 1/4] mm: Don't skip swap entry even if zap_details
specified
On 2/16/22 1:48 AM, Peter Xu wrote:
> The "details" pointer shouldn't be the token to decide whether we should skip
> swap entries. For example, when the user specified details->zap_mapping==NULL,
> it means the user wants to zap all the pages (including COWed pages), then we
> need to look into swap entries because there can be private COWed pages that
> was swapped out.
Hi Peter,
The changes look good, just some minor readability suggestions below:
(btw, hch is going to ask you to reflow all of the commit descriptions
to 72 cols, so you might as well do it in advance. :)
>
> Skipping some swap entries when details is non-NULL may lead to wrongly leaving
> some of the swap entries while we should have zapped them.
>
> A reproducer of the problem:
>
> ===8<===
> #define _GNU_SOURCE /* See feature_test_macros(7) */
> #include <stdio.h>
> #include <assert.h>
> #include <unistd.h>
> #include <sys/mman.h>
> #include <sys/types.h>
>
> int page_size;
> int shmem_fd;
> char *buffer;
>
> void main(void)
> {
> int ret;
> char val;
>
> page_size = getpagesize();
> shmem_fd = memfd_create("test", 0);
> assert(shmem_fd >= 0);
>
> ret = ftruncate(shmem_fd, page_size * 2);
> assert(ret == 0);
>
> buffer = mmap(NULL, page_size * 2, PROT_READ | PROT_WRITE,
> MAP_PRIVATE, shmem_fd, 0);
> assert(buffer != MAP_FAILED);
>
> /* Write private page, swap it out */
> buffer[page_size] = 1;
> madvise(buffer, page_size * 2, MADV_PAGEOUT);
>
> /* This should drop private buffer[page_size] already */
> ret = ftruncate(shmem_fd, page_size);
> assert(ret == 0);
> /* Recover the size */
> ret = ftruncate(shmem_fd, page_size * 2);
> assert(ret == 0);
>
> /* Re-read the data, it should be all zero */
> val = buffer[page_size];
> if (val == 0)
> printf("Good\n");
> else
> printf("BUG\n");
> }
> ===8<===
>
> We don't need to touch up the pmd path, because pmd never had a issue with swap
> entries. For example, shmem pmd migration will always be split into pte level,
> and same to swapping on anonymous.
>
> Add another helper should_zap_cows() so that we can also check whether we
> should zap private mappings when there's no page pointer specified.
>
> This patch drops that trick, so we handle swap ptes coherently. Meanwhile we
> should do the same check upon migration entry, hwpoison entry and genuine swap
> entries too. To be explicit, we should still remember to keep the private
> entries if even_cows==false, and always zap them when even_cows==true.
>
> The issue seems to exist starting from the initial commit of git.
>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Signed-off-by: Peter Xu <peterx@...hat.com>
> ---
> mm/memory.c | 45 ++++++++++++++++++++++++++++++++++++---------
> 1 file changed, 36 insertions(+), 9 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index c125c4969913..4bfeaca7cbc7 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1313,6 +1313,17 @@ struct zap_details {
> struct folio *single_folio; /* Locked folio to be unmapped */
> };
>
> +/* Whether we should zap all COWed (private) pages too */
> +static inline bool should_zap_cows(struct zap_details *details)
> +{
> + /* By default, zap all pages */
> + if (!details)
> + return true;
> +
> + /* Or, we zap COWed pages only if the caller wants to */
> + return !details->zap_mapping;
> +}
> +
> /*
> * We set details->zap_mapping when we want to unmap shared but keep private
> * pages. Return true if skip zapping this page, false otherwise.
> @@ -1320,11 +1331,15 @@ struct zap_details {
> static inline bool
> zap_skip_check_mapping(struct zap_details *details, struct page *page)
> {
> - if (!details || !page)
> + /* If we can make a decision without *page.. */
> + if (should_zap_cows(details))
> return false;
>
> - return details->zap_mapping &&
> - (details->zap_mapping != page_rmapping(page));
> + /* E.g. zero page */
It's a bit confusing to see a comment that "this could be the zero page", if
the value is NULL. Maybe, "the caller passes NULL for the case of a zero
page", or something along those lines?
> + if (!page)
> + return false;
> +
> + return details->zap_mapping != page_rmapping(page);
> }
>
> static unsigned long zap_pte_range(struct mmu_gather *tlb,
> @@ -1405,17 +1420,29 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
> continue;
> }
>
> - /* If details->check_mapping, we leave swap entries. */
> - if (unlikely(details))
> - continue;
> -
> - if (!non_swap_entry(entry))
> + if (!non_swap_entry(entry)) {
> + /*
> + * If this is a genuine swap entry, then it must be an
> + * private anon page. If the caller wants to skip
> + * COWed pages, ignore it.
> + */
How about this instead:
/* Genuine swap entry, and therefore a private anon page. */
> + if (!should_zap_cows(details))
> + continue;
> rss[MM_SWAPENTS]--;
> - else if (is_migration_entry(entry)) {
Can we put a newline here, and before each "else" block? Because now it
is getting very dense, and the visual separation really helps.
> + } else if (is_migration_entry(entry)) {
> struct page *page;
>
> page = pfn_swap_entry_to_page(entry);
> + if (zap_skip_check_mapping(details, page))
> + continue;
> rss[mm_counter(page)]--;
Newline here.
> + } else if (is_hwpoison_entry(entry)) {
> + /* If the caller wants to skip COWed pages, ignore it */
Likewise, I'd prefer we delete that comment, because it exactly matches
what the following line of code says.
> + if (!should_zap_cows(details))
> + continue;
And newline here too.
> + } else {
> + /* We should have covered all the swap entry types */
> + WARN_ON_ONCE(1);
> }
> if (unlikely(!free_swap_and_cache(entry)))
> print_bad_pte(vma, addr, ptent, NULL);
Those are all just nits, and as I mentioned, the actual changes look good
to me, so:
Reviewed-by: John Hubbard <jhubbard@...dia.com>
thanks,
--
John Hubbard
NVIDIA
Powered by blists - more mailing lists