linux-kernel - Re: [PATCH v3] mm/hugetlb: fix hugetlb vs. core-mm PT locking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <541f6c23-77ad-4d46-a8ed-fb18c9b635b3@redhat.com>
Date: Thu, 1 Aug 2024 10:50:18 +0200
From: David Hildenbrand <david@...hat.com>
To: linux-kernel@...r.kernel.org
Cc: linux-mm@...ck.org, James Houghton <jthoughton@...gle.com>,
 stable@...r.kernel.org, Peter Xu <peterx@...hat.com>,
 Oscar Salvador <osalvador@...e.de>, Muchun Song <muchun.song@...ux.dev>,
 Baolin Wang <baolin.wang@...ux.alibaba.com>
Subject: Re: [PATCH v3] mm/hugetlb: fix hugetlb vs. core-mm PT locking

On 31.07.24 14:21, David Hildenbrand wrote:
> We recently made GUP's common page table walking code to also walk hugetlb
> VMAs without most hugetlb special-casing, preparing for the future of
> having less hugetlb-specific page table walking code in the codebase.
> Turns out that we missed one page table locking detail: page table locking
> for hugetlb folios that are not mapped using a single PMD/PUD.

James, Peter,

the following seems to get the job done. Thoughts?

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 8e462205400d..776dc3914d9e 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -938,10 +938,40 @@ static inline bool htlb_allow_alloc_fallback(int reason)
  static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
  					   struct mm_struct *mm, pte_t *pte)
  {
-	if (huge_page_size(h) == PMD_SIZE)
+	unsigned long size = huge_page_size(h);
+
+	VM_WARN_ON(size == PAGE_SIZE);
+
+	/*
+	 * hugetlb must use the exact same PT locks as core-mm page table
+	 * walkers would. When modifying a PTE table, hugetlb must take the
+	 * PTE PT lock, when modifying a PMD table, hugetlb must take the PMD
+	 * PT lock etc.
+	 *
+	 * The expectation is that any hugetlb folio smaller than a PMD is
+	 * always mapped into a single PTE table and that any hugetlb folio
+	 * smaller than a PUD (but at least as big as a PMD) is always mapped
+	 * into a single PMD table.
+	 *
+	 * If that does not hold for an architecture, then that architecture
+	 * must disable split PT locks such that all *_lockptr() functions
+	 * will give us the same result: the per-MM PT lock.
+	 *
+	 * Note that with e.g., CONFIG_PGTABLE_LEVELS=2 where
+	 * PGDIR_SIZE==P4D_SIZE==PUD_SIZE==PMD_SIZE, we'd use the MM PT lock
+	 * directly with a PMD hugetlb size, whereby core-mm would call
+	 * pmd_lockptr() instead. However, in such configurations split PMD
+	 * locks are disabled -- split locks don't make sense on a single
+	 * PGDIR page table -- and the end result is the same.
+	 */
+	if (size >= P4D_SIZE)
+		return &mm->page_table_lock;
+	else if (size >= PUD_SIZE)
+		return pud_lockptr(mm, (pud_t *) pte);
+	else if (size >= PMD_SIZE || IS_ENABLED(CONFIG_HIGHPTE))
  		return pmd_lockptr(mm, (pmd_t *) pte);
-	VM_BUG_ON(huge_page_size(h) == PAGE_SIZE);
-	return &mm->page_table_lock;
+	/* pte_alloc_huge() only applies with !CONFIG_HIGHPTE */
+	return ptep_lockptr(mm, pte);
  }
  
  #ifndef hugepages_supported
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a890a1731c14..bd219ac9c026 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2869,6 +2869,13 @@ static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
  	return ptlock_ptr(page_ptdesc(pmd_page(*pmd)));
  }
  
+static inline spinlock_t *ptep_lockptr(struct mm_struct *mm, pte_t *pte)
+{
+	BUILD_BUG_ON(IS_ENABLED(CONFIG_HIGHPTE));
+	BUILD_BUG_ON(MAX_PTRS_PER_PTE * sizeof(pte_t) > PAGE_SIZE);
+	return ptlock_ptr(virt_to_ptdesc(pte));
+}
+
  static inline bool ptlock_init(struct ptdesc *ptdesc)
  {
  	/*
@@ -2893,6 +2900,10 @@ static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
  {
  	return &mm->page_table_lock;
  }
+static inline spinlock_t *ptep_lockptr(struct mm_struct *mm, pte_t *pte)
+{
+	return &mm->page_table_lock;
+}
  static inline void ptlock_cache_init(void) {}
  static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; }
  static inline void ptlock_free(struct ptdesc *ptdesc) {}
-- 
2.45.2


-- 
Cheers,

David / dhildenb