[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHbLzkq+Mf3N1FvjMRD8+SiEsry_39ycgCN92GHp5VsshyKE8w@mail.gmail.com>
Date: Fri, 2 Jun 2023 19:04:48 -0700
From: Yang Shi <shy828301@...il.com>
To: Peter Xu <peterx@...hat.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
David Hildenbrand <david@...hat.com>,
Alistair Popple <apopple@...dia.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Andrea Arcangeli <aarcange@...hat.com>,
"Kirill A . Shutemov" <kirill@...temov.name>,
Johannes Weiner <hannes@...xchg.org>,
John Hubbard <jhubbard@...dia.com>,
Naoya Horiguchi <naoya.horiguchi@....com>,
Muhammad Usama Anjum <usama.anjum@...labora.com>,
Hugh Dickins <hughd@...gle.com>,
Mike Rapoport <rppt@...nel.org>
Subject: Re: [PATCH 1/4] mm/mprotect: Retry on pmd_trans_unstable()
On Fri, Jun 2, 2023 at 4:06 PM Peter Xu <peterx@...hat.com> wrote:
>
> When hit unstable pmd, we should retry the pmd once more because it means
> we probably raced with a thp insertion.
>
> Skipping it might be a problem as no error will be reported to the caller.
> I assume it means the user will expect prot changed (e.g. mprotect or
> userfaultfd wr-protections) applied but it's actually not.
IIRC, mprotect() holds write mmap_lock, so it should not matter. PROT
NUMA holds read mmap_lock, but returning 0 also doesn't matter (of
course retry is fine too). just skip that 2M area. The userfaultfd-wp
is your call :-)
>
> To achieve it, move the pmd_trans_unstable() call out of change_pte_range()
> which will make the retry easier, as we can keep the retval of
> change_pte_range() untouched.
>
> Signed-off-by: Peter Xu <peterx@...hat.com>
> ---
> mm/mprotect.c | 20 +++++++++++---------
> 1 file changed, 11 insertions(+), 9 deletions(-)
>
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index 92d3d3ca390a..e4756899d40c 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -94,15 +94,6 @@ static long change_pte_range(struct mmu_gather *tlb,
>
> tlb_change_page_size(tlb, PAGE_SIZE);
>
> - /*
> - * Can be called with only the mmap_lock for reading by
> - * prot_numa so we must check the pmd isn't constantly
> - * changing from under us from pmd_none to pmd_trans_huge
> - * and/or the other way around.
> - */
> - if (pmd_trans_unstable(pmd))
> - return 0;
> -
> /*
> * The pmd points to a regular pte so the pmd can't change
> * from under us even if the mmap_lock is only hold for
> @@ -411,6 +402,7 @@ static inline long change_pmd_range(struct mmu_gather *tlb,
> pages = ret;
> break;
> }
> +again:
> /*
> * Automatic NUMA balancing walks the tables with mmap_lock
> * held for read. It's possible a parallel update to occur
> @@ -465,6 +457,16 @@ static inline long change_pmd_range(struct mmu_gather *tlb,
> }
> /* fall through, the trans huge pmd just split */
> }
> +
> + /*
> + * Can be called with only the mmap_lock for reading by
> + * prot_numa or userfaultfd-wp, so we must check the pmd
> + * isn't constantly changing from under us from pmd_none to
> + * pmd_trans_huge and/or the other way around.
> + */
> + if (pmd_trans_unstable(pmd))
> + goto again;
> +
> pages += change_pte_range(tlb, vma, pmd, addr, next,
> newprot, cp_flags);
> next:
> --
> 2.40.1
>
>
Powered by blists - more mailing lists