[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aDeBUXCRLRZobHq0@localhost.localdomain>
Date: Wed, 28 May 2025 23:34:09 +0200
From: Oscar Salvador <osalvador@...e.de>
To: David Hildenbrand <david@...hat.com>
Cc: Peter Xu <peterx@...hat.com>, Gavin Guo <gavinguo@...lia.com>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
muchun.song@...ux.dev, akpm@...ux-foundation.org,
mike.kravetz@...cle.com, kernel-dev@...lia.com,
stable@...r.kernel.org, Hugh Dickins <hughd@...gle.com>,
Florent Revest <revest@...gle.com>, Gavin Shan <gshan@...hat.com>
Subject: Re: [PATCH v3] mm/hugetlb: fix a deadlock with pagecache_folio and
hugetlb_fault_mutex_table
On Wed, May 28, 2025 at 10:26:04PM +0200, David Hildenbrand wrote:
> Digging a bit:
>
> commit 56c9cfb13c9b6516017eea4e8cbe22ea02e07ee6
> Author: Naoya Horiguchi <nao.horiguchi@...il.com>
> Date: Fri Sep 10 13:23:04 2010 +0900
>
> hugetlb, rmap: fix confusing page locking in hugetlb_cow()
> The "if (!trylock_page)" block in the avoidcopy path of hugetlb_cow()
> looks confusing and is buggy. Originally this trylock_page() was
> intended to make sure that old_page is locked even when old_page !=
> pagecache_page, because then only pagecache_page is locked.
>
> Added the comment
>
> + /*
> + * hugetlb_cow() requires page locks of pte_page(entry) and
> + * pagecache_page, so here we need take the former one
> + * when page != pagecache_page or !pagecache_page.
> + * Note that locking order is always pagecache_page -> page,
> + * so no worry about deadlock.
> + */
>
>
> And
>
> commit 0fe6e20b9c4c53b3e97096ee73a0857f60aad43f
> Author: Naoya Horiguchi <nao.horiguchi@...il.com>
> Date: Fri May 28 09:29:16 2010 +0900
>
> hugetlb, rmap: add reverse mapping for hugepage
> This patch adds reverse mapping feature for hugepage by introducing
> mapcount for shared/private-mapped hugepage and anon_vma for
> private-mapped hugepage.
> While hugepage is not currently swappable, reverse mapping can be useful
> for memory error handler.
> Without this patch, memory error handler cannot identify processes
> using the bad hugepage nor unmap it from them. That is:
> - for shared hugepage:
> we can collect processes using a hugepage through pagecache,
> but can not unmap the hugepage because of the lack of mapcount.
> - for privately mapped hugepage:
> we can neither collect processes nor unmap the hugepage.
> This patch solves these problems.
> This patch include the bug fix given by commit 23be7468e8, so reverts it.
>
> Added the real locking magic.
Yes, I have been checking "hugetlb, rmap: add reverse mapping for
hugepage", which added locking the now-so-called 'old_folio' in case
hugetlbfs_pagecache_page() didn't return anything.
Because in hugetlb_wp, this was added:
@@ -2286,8 +2299,11 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
retry_avoidcopy:
/* If no-one else is actually using this page, avoid the copy
* and just make the page writable */
- avoidcopy = (page_count(old_page) == 1);
+ avoidcopy = (page_mapcount(old_page) == 1);
if (avoidcopy) {
+ if (!trylock_page(old_page))
+ if (PageAnon(old_page))
+ page_move_anon_rmap(old_page, vma, address);
So, as you mentioned, it was done to keep the rmap stable as I guess rmap test test the
PageLock.
> Not that much changed regarding locking until COW support was added in
>
> commit 1e8f889b10d8d2223105719e36ce45688fedbd59
> Author: David Gibson <david@...son.dropbear.id.au>
> Date: Fri Jan 6 00:10:44 2006 -0800
>
> [PATCH] Hugetlb: Copy on Write support
> Implement copy-on-write support for hugetlb mappings so MAP_PRIVATE can be
> supported. This helps us to safely use hugetlb pages in many more
> applications. The patch makes the following changes. If needed, I also have
> it broken out according to the following paragraphs.
>
>
> Confusing.
>
> Locking the *old_folio* when calling hugetlb_wp() makes sense when it is
> an anon folio because we might want to call folio_move_anon_rmap() to adjust the rmap root.
Yes, this is clear.
> Locking the pagecache folio when calling hugetlb_wp() if old_folio is an anon folio ...
> does not make sense to me.
I think this one is also clear.
> Locking the pagecache folio when calling hugetlb_wp if old_folio is a pageache folio ...
> also doesn't quite make sense for me.
> Again, we don't take the lock for ordinary pages, so what's special about hugetlb for the last
> case (reservations, I assume?).
So, this case is when pagecache_folio == old_folio.
I guess we are talking about resv_maps? But I think we cannot interfere there.
For the reserves to be modified the page has to go away.
Now, I have been checking this one too:
commit 04f2cbe35699d22dbf428373682ead85ca1240f5
Author: Mel Gorman <mel@....ul.ie>
Date: Wed Jul 23 21:27:25 2008 -0700
hugetlb: guarantee that COW faults for a process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed
And I think it is interesting.
That one added this chunk in hugetlb_fault():
@@ -1126,8 +1283,15 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
spin_lock(&mm->page_table_lock);
/* Check for a racing update before calling hugetlb_cow */
if (likely(pte_same(entry, huge_ptep_get(ptep))))
- if (write_access && !pte_write(entry))
- ret = hugetlb_cow(mm, vma, address, ptep, entry);
+ if (write_access && !pte_write(entry)) {
+ struct page *page;
+ page = hugetlbfs_pagecache_page(vma, address);
+ ret = hugetlb_cow(mm, vma, address, ptep, entry, page);
+ if (page) {
+ unlock_page(page);
+ put_page(page);
+ }
+ }
So, it finds and lock the page in the pagecache, and calls hugetlb_cow.
hugetlb_fault() takes hugetlb_instantiation_mutex, and there is a
comment saying:
/*
* Serialize hugepage allocation and instantiation, so that we don't
* get spurious allocation failures if two CPUs race to instantiate
* the same page in the page cache.
*/
mutex_lock(&hugetlb_instantiation_mutex);
But it does not say anything about truncation.
Actually, checking the truncation code from back then,
truncate_hugepages() (and none of its callers) take the hugetlb_instantiation_mutex,
as it is done today (e.g: current remove_inode_hugepages() code).
Back then, truncate_hugepages() relied only in lock_page():
static void truncate_hugepages(struct inode *inode, loff_t lstart)
{
...
...
lock_page(page);
truncate_huge_page(page);
unlock_page(page);
}
While today, remove_inode_hugepages() takes the mutex, and also the lock.
And then zaps the page and does its thing with resv_maps.
So I think that we should not even need the lock for hugetlb_wp
when pagecache_folio == old_folio (pagecache), because the mutex
already protects us from the page to go away, right (e.g: truncated)?
Besides we hold a reference on that page since
filemap_lock_hugetlb_folio() locks the page and increases its refcount.
All in all, I am leaning towards not being needed, but it's getting late
here..
--
Oscar Salvador
SUSE Labs
Powered by blists - more mailing lists