linux-kernel - Re: [PATCH v1 2/2] mm/hugetlb: fix hugetlb vs. core-mm PT locking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c16a731f-1029-4ede-bbea-af2218c566d1@redhat.com>
Date: Fri, 26 Jul 2024 10:04:18 +0200
From: David Hildenbrand <david@...hat.com>
To: Baolin Wang <baolin.wang@...ux.alibaba.com>, linux-kernel@...r.kernel.org
Cc: linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
 Muchun Song <muchun.song@...ux.dev>, Peter Xu <peterx@...hat.com>,
 Oscar Salvador <osalvador@...e.de>, stable@...r.kernel.org
Subject: Re: [PATCH v1 2/2] mm/hugetlb: fix hugetlb vs. core-mm PT locking

On 26.07.24 04:33, Baolin Wang wrote:
> 
> 
> On 2024/7/26 02:39, David Hildenbrand wrote:
>> We recently made GUP's common page table walking code to also walk
>> hugetlb VMAs without most hugetlb special-casing, preparing for the
>> future of having less hugetlb-specific page table walking code in the
>> codebase. Turns out that we missed one page table locking detail: page
>> table locking for hugetlb folios that are not mapped using a single
>> PMD/PUD.
>>
>> Assume we have hugetlb folio that spans multiple PTEs (e.g., 64 KiB
>> hugetlb folios on arm64 with 4 KiB base page size). GUP, as it walks the
>> page tables, will perform a pte_offset_map_lock() to grab the PTE table
>> lock.
>>
>> However, hugetlb that concurrently modifies these page tables would
>> actually grab the mm->page_table_lock: with USE_SPLIT_PTE_PTLOCKS, the
>> locks would differ. Something similar can happen right now with hugetlb
>> folios that span multiple PMDs when USE_SPLIT_PMD_PTLOCKS.
>>
>> Let's make huge_pte_lockptr() effectively uses the same PT locks as any
>> core-mm page table walker would.
> 
> Thanks for raising the issue again. I remember fixing this issue 2 years
> ago in commit fac35ba763ed ("mm/hugetlb: fix races when looking up a
> CONT-PTE/PMD size hugetlb page"), but it seems to be broken again.
> 

Ah, right! We fixed it by rerouting to hugetlb code that we then removed :D

Did we have a reproducer back then that would make my live easier?

>> There is one ugly case: powerpc 8xx, whereby we have an 8 MiB hugetlb
>> folio being mapped using two PTE page tables. While hugetlb wants to take
>> the PMD table lock, core-mm would grab the PTE table lock of one of both
>> PTE page tables. In such corner cases, we have to make sure that both
>> locks match, which is (fortunately!) currently guaranteed for 8xx as it
>> does not support SMP.
>>
>> Fixes: 9cb28da54643 ("mm/gup: handle hugetlb in the generic follow_page_mask code")
>> Cc: <stable@...r.kernel.org>
>> Signed-off-by: David Hildenbrand <david@...hat.com>
>> ---
>>    include/linux/hugetlb.h | 25 ++++++++++++++++++++++---
>>    1 file changed, 22 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
>> index c9bf68c239a01..da800e56fe590 100644
>> --- a/include/linux/hugetlb.h
>> +++ b/include/linux/hugetlb.h
>> @@ -944,10 +944,29 @@ static inline bool htlb_allow_alloc_fallback(int reason)
>>    static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
>>    					   struct mm_struct *mm, pte_t *pte)
>>    {
>> -	if (huge_page_size(h) == PMD_SIZE)
>> +	VM_WARN_ON(huge_page_size(h) == PAGE_SIZE);
>> +	VM_WARN_ON(huge_page_size(h) >= P4D_SIZE);
>> +
>> +	/*
>> +	 * hugetlb must use the exact same PT locks as core-mm page table
>> +	 * walkers would. When modifying a PTE table, hugetlb must take the
>> +	 * PTE PT lock, when modifying a PMD table, hugetlb must take the PMD
>> +	 * PT lock etc.
>> +	 *
>> +	 * The expectation is that any hugetlb folio smaller than a PMD is
>> +	 * always mapped into a single PTE table and that any hugetlb folio
>> +	 * smaller than a PUD (but at least as big as a PMD) is always mapped
>> +	 * into a single PMD table.
> 
> ARM64 also supports cont-PMD size hugetlb, which is 32MiB size with a 4
> KiB base page size. This means the PT locks for 32MiB hugetlb may race
> again, as we currently only hold one PMD lock for several PMD entries of
> a cont-PMD size hugetlb.

Exactly, that's the case where all cont-PMD entries fall into the same 
page table.

That's also what I will try reproducing with (migration racing with 
GUP). But the race window is small and a lot of other stuff is protected 
by the VMA lock.

-- 
Cheers,

David / dhildenb