linux-kernel - Re: [PATCH v3] mm/hugetlb: fix a deadlock with pagecache_folio and hugetlb_fault_mutex

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <04ecf2e3-651a-47c9-9f30-d31423e2c9d7@redhat.com>
Date: Wed, 28 May 2025 22:26:04 +0200
From: David Hildenbrand <david@...hat.com>
To: Oscar Salvador <osalvador@...e.de>
Cc: Peter Xu <peterx@...hat.com>, Gavin Guo <gavinguo@...lia.com>,
 linux-mm@...ck.org, linux-kernel@...r.kernel.org, muchun.song@...ux.dev,
 akpm@...ux-foundation.org, mike.kravetz@...cle.com, kernel-dev@...lia.com,
 stable@...r.kernel.org, Hugh Dickins <hughd@...gle.com>,
 Florent Revest <revest@...gle.com>, Gavin Shan <gshan@...hat.com>
Subject: Re: [PATCH v3] mm/hugetlb: fix a deadlock with pagecache_folio and
 hugetlb_fault_mutex_table

On 28.05.25 22:00, David Hildenbrand wrote:
> On 28.05.25 17:45, Oscar Salvador wrote:
>> On Wed, May 28, 2025 at 05:09:26PM +0200, David Hildenbrand wrote:
>>> On 28.05.25 17:03, Peter Xu wrote:
>>>> So I'm not 100% sure we need the folio lock even for copy; IIUC a refcount
>>>> would be enough?
>>>
>>> The introducing patches seem to talk about blocking concurrent migration /
>>> rmap walks.
>>
>> I thought the main reason was because PageLock protects us against writes,
>> so when copying (in case of copying the underlying file), we want the
>> file to be stable throughout the copy?
> 
> Well, we don't do the same for ordinary pages, why should we do for hugetlb?
> 
> See wp_page_copy().
> 
> If you have a MAP_PRIVATE mapping of a file and modify the pagecache
> pages concurrently (write to another MAP_SHARED mapping, write() ...),
> there are no guarantees about one observing any specific page state.
> 
> At least not that I am aware of ;)
> 
> 
>>
>>> Maybe also concurrent fallocate(PUNCH_HOLE) is a problem regarding
>>> reservations? Not sure ...
>>
>> fallocate()->hugetlb_vmdelete_list() tries to grab the vma in write-mode,
>> and hugetlb_wp() grabs the lock in read-mode, so we should be covered?
> 
> Yeah, maybe that's the case nowadays. Maybe it wasn't in the past ...
> 
>>
>> Also, hugetlbfs_punch_hole()->remove_inode_hugepages() will try to grab the mutex.
>>
>> The only fishy thing I see is hugetlbfs_zero_partial_page().
>>
>> But that is for old_page, and as I said, I thought main reason was to
>> protect us against writes during the copy.
> 
> See above, I really wouldn't understand why that is required.
> 
>>
>>> For 2) I am also not sure if we need need the pagecache folio locked; I
>>> doubt it ... but this code is not the easiest to follow.
>>    
>> I have been staring at that code and thinking about potential scenarios
>> for a few days now, and I cannot convice myself that we need
>> pagecache_folio's lock when pagecache_folio != old_folio because as a
>> matter of fact I cannot think of anything it protects us against.
>>
>> I plan to rework this in a more sane way, or at least less offusctaed, and then
>> Galvin can fire his syzkaller to check whether we are good.

Digging a bit:

commit 56c9cfb13c9b6516017eea4e8cbe22ea02e07ee6
Author: Naoya Horiguchi <nao.horiguchi@...il.com>
Date:   Fri Sep 10 13:23:04 2010 +0900

     hugetlb, rmap: fix confusing page locking in hugetlb_cow()
     
     The "if (!trylock_page)" block in the avoidcopy path of hugetlb_cow()
     looks confusing and is buggy.  Originally this trylock_page() was
     intended to make sure that old_page is locked even when old_page !=
     pagecache_page, because then only pagecache_page is locked.

Added the comment

+       /*
+        * hugetlb_cow() requires page locks of pte_page(entry) and
+        * pagecache_page, so here we need take the former one
+        * when page != pagecache_page or !pagecache_page.
+        * Note that locking order is always pagecache_page -> page,
+        * so no worry about deadlock.
+        */


And

commit 0fe6e20b9c4c53b3e97096ee73a0857f60aad43f
Author: Naoya Horiguchi <nao.horiguchi@...il.com>
Date:   Fri May 28 09:29:16 2010 +0900

     hugetlb, rmap: add reverse mapping for hugepage
     
     This patch adds reverse mapping feature for hugepage by introducing
     mapcount for shared/private-mapped hugepage and anon_vma for
     private-mapped hugepage.
     
     While hugepage is not currently swappable, reverse mapping can be useful
     for memory error handler.
     
     Without this patch, memory error handler cannot identify processes
     using the bad hugepage nor unmap it from them. That is:
     - for shared hugepage:
       we can collect processes using a hugepage through pagecache,
       but can not unmap the hugepage because of the lack of mapcount.
     - for privately mapped hugepage:
       we can neither collect processes nor unmap the hugepage.
     This patch solves these problems.
     
     This patch include the bug fix given by commit 23be7468e8, so reverts it.

Added the real locking magic.

Not that much changed regarding locking until COW support was added in

commit 1e8f889b10d8d2223105719e36ce45688fedbd59
Author: David Gibson <david@...son.dropbear.id.au>
Date:   Fri Jan 6 00:10:44 2006 -0800

     [PATCH] Hugetlb: Copy on Write support
     
     Implement copy-on-write support for hugetlb mappings so MAP_PRIVATE can be
     supported.  This helps us to safely use hugetlb pages in many more
     applications.  The patch makes the following changes.  If needed, I also have
     it broken out according to the following paragraphs.


Confusing.

Locking the *old_folio* when calling hugetlb_wp() makes sense when it is
an anon folio because we might want to call folio_move_anon_rmap() to adjust the rmap root.

Locking the pagecache folio when calling hugetlb_wp() if old_folio is an anon folio ...
does not make sense to me.

Locking the pagecache folio when calling hugetlb_wp if old_folio is a pageache folio ...
also doesn't quite make sense for me.

Again, we don't take the lock for ordinary pages, so what's special about hugetlb for the last
case (reservations, I assume?).

-- 
Cheers,

David / dhildenb