[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0067dfe6-b9a6-4e98-9eef-7219299bfe58@linux.alibaba.com>
Date: Fri, 26 Jul 2024 10:33:14 +0800
From: Baolin Wang <baolin.wang@...ux.alibaba.com>
To: David Hildenbrand <david@...hat.com>, linux-kernel@...r.kernel.org
Cc: linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
Muchun Song <muchun.song@...ux.dev>, Peter Xu <peterx@...hat.com>,
Oscar Salvador <osalvador@...e.de>, stable@...r.kernel.org
Subject: Re: [PATCH v1 2/2] mm/hugetlb: fix hugetlb vs. core-mm PT locking
On 2024/7/26 02:39, David Hildenbrand wrote:
> We recently made GUP's common page table walking code to also walk
> hugetlb VMAs without most hugetlb special-casing, preparing for the
> future of having less hugetlb-specific page table walking code in the
> codebase. Turns out that we missed one page table locking detail: page
> table locking for hugetlb folios that are not mapped using a single
> PMD/PUD.
>
> Assume we have hugetlb folio that spans multiple PTEs (e.g., 64 KiB
> hugetlb folios on arm64 with 4 KiB base page size). GUP, as it walks the
> page tables, will perform a pte_offset_map_lock() to grab the PTE table
> lock.
>
> However, hugetlb that concurrently modifies these page tables would
> actually grab the mm->page_table_lock: with USE_SPLIT_PTE_PTLOCKS, the
> locks would differ. Something similar can happen right now with hugetlb
> folios that span multiple PMDs when USE_SPLIT_PMD_PTLOCKS.
>
> Let's make huge_pte_lockptr() effectively uses the same PT locks as any
> core-mm page table walker would.
Thanks for raising the issue again. I remember fixing this issue 2 years
ago in commit fac35ba763ed ("mm/hugetlb: fix races when looking up a
CONT-PTE/PMD size hugetlb page"), but it seems to be broken again.
> There is one ugly case: powerpc 8xx, whereby we have an 8 MiB hugetlb
> folio being mapped using two PTE page tables. While hugetlb wants to take
> the PMD table lock, core-mm would grab the PTE table lock of one of both
> PTE page tables. In such corner cases, we have to make sure that both
> locks match, which is (fortunately!) currently guaranteed for 8xx as it
> does not support SMP.
>
> Fixes: 9cb28da54643 ("mm/gup: handle hugetlb in the generic follow_page_mask code")
> Cc: <stable@...r.kernel.org>
> Signed-off-by: David Hildenbrand <david@...hat.com>
> ---
> include/linux/hugetlb.h | 25 ++++++++++++++++++++++---
> 1 file changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index c9bf68c239a01..da800e56fe590 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -944,10 +944,29 @@ static inline bool htlb_allow_alloc_fallback(int reason)
> static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
> struct mm_struct *mm, pte_t *pte)
> {
> - if (huge_page_size(h) == PMD_SIZE)
> + VM_WARN_ON(huge_page_size(h) == PAGE_SIZE);
> + VM_WARN_ON(huge_page_size(h) >= P4D_SIZE);
> +
> + /*
> + * hugetlb must use the exact same PT locks as core-mm page table
> + * walkers would. When modifying a PTE table, hugetlb must take the
> + * PTE PT lock, when modifying a PMD table, hugetlb must take the PMD
> + * PT lock etc.
> + *
> + * The expectation is that any hugetlb folio smaller than a PMD is
> + * always mapped into a single PTE table and that any hugetlb folio
> + * smaller than a PUD (but at least as big as a PMD) is always mapped
> + * into a single PMD table.
ARM64 also supports cont-PMD size hugetlb, which is 32MiB size with a 4
KiB base page size. This means the PT locks for 32MiB hugetlb may race
again, as we currently only hold one PMD lock for several PMD entries of
a cont-PMD size hugetlb.
> + *
> + * If that does not hold for an architecture, then that architecture
> + * must disable split PT locks such that all *_lockptr() functions
> + * will give us the same result: the per-MM PT lock.
> + */
> + if (huge_page_size(h) < PMD_SIZE)
> + return pte_lockptr(mm, pte);
> + else if (huge_page_size(h) < PUD_SIZE)
> return pmd_lockptr(mm, (pmd_t *) pte);
IIUC, as I said above, this change doesn't fix the inconsistent lock for
cont-PMD size hugetlb for GUP, and it will also break the lock rule for
unmapping/migrating a cont-PMD size hugetlb (use mm->page_table_lock
before for cont-PMD size hugetlb before).
> - VM_BUG_ON(huge_page_size(h) == PAGE_SIZE);
> - return &mm->page_table_lock;
> + return pud_lockptr(mm, (pud_t *) pte);
> }
>
> #ifndef hugepages_supported
Powered by blists - more mailing lists