[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aFE9YTNcCHAGBtKi@localhost.localdomain>
Date: Tue, 17 Jun 2025 12:03:13 +0200
From: Oscar Salvador <osalvador@...e.de>
To: David Hildenbrand <david@...hat.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Muchun Song <muchun.song@...ux.dev>,
James Houghton <jthoughton@...gle.com>,
Peter Xu <peterx@...hat.com>, Gavin Guo <gavinguo@...lia.com>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/5] mm,hugetlb: Document the reason to lock the folio in
the faulting path
On Mon, Jun 16, 2025 at 04:41:20PM +0200, David Hildenbrand wrote:
> On 16.06.25 16:10, Oscar Salvador wrote:
> > What do you mean by stable?
>
> The same "stable" you used in the doc, that I complained about ;)
Touche :-D
> > In the generic faulting path, we're not worried about the page going away
> > because we hold a reference, so I guess the lock must be to keep content stable?
>
> What you want to avoid is IIRC, is someone doing a truncation/reclaim on the
> folio while you are mapping it.
Ok, I see. I thought it was more about holding writes, but this makes sense.
> Take a look at truncate_inode_pages_range() where we do a folio_lock()
> around truncate_inode_folio().
>
> In other words, while you hold the folio lock (and verified that the folio
> was not truncated yet: for example, that folio->mapping is still set), you
> know that it cannot get truncated concurrently -- without holding other
> expensive locks.
>
> Observe how truncate_cleanup_folio() calls
>
> if (folio_mapped(folio))
> unmap_mapping_folio(folio);
>
> To remove all page table mappings.
>
> So while holding the folio lock, new page table mappings are not expected to
> appear (IIRC).
Ah ok, so it's more that we don't end up mapping something that's not there
anymore (or something completely different).
> > I mean, yes, after we have mapped the page privately into the pagetables,
> > we don't have business about content-integrity anymore, so given this rule, yes,
> > I guess hugetlb_wp() wouldn't need the lock (for !anonymous) because we already
> > have mapped it privately at that point.
>
> That's my understanding. And while holding the PTL it cannot get unmapped.
> Whenever you temporarily drop the PTL, you have to do a pte_same() check to
> make sure concurrent truncation didn't happen.
Yap, hugetlb_wp() drops the locks temporarily when it needs to unmap the private
page from other processes, but then does the pte_same() check.
> So far my understanding at least of common filemap code.
>
> >
> > But there's something I don't fully understand and makes me feel uneasy.
> > If the lock in the generic faultin path is to keep content stable till we
> > have mapped it privately, wouldn't be more correct to also hold it
> > during the copy in hugetlb_wp, to kinda emulate that?
> As long there us a page table mapping, it cannot get truncated. So if you
> find a PTE under PTL that maps that folio, truncation could not have
> happened.
I see, this makes a lot of sense, thanks for walking me through David!
Alright, then, with all this clear now we should:
- Not take any locks on hugetlb_fault()->hugetlb_wp(), hugetlb_wp() will take it
if it's an anonymous folio (re-use check)
- Drop the lock in hugetlb_no_page() after we have mapped the page in
the pagetables
- hugetlb_wp() will take the lock IFF the folio is anonymous
This will lead to something like the following:
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index dfa09fc3b2c6..4d48cda8a56d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6198,6 +6198,8 @@ static vm_fault_t hugetlb_wp(struct vm_fault *vmf)
* in scenarios that used to work. As a side effect, there can still
* be leaks between processes, for example, with FOLL_GET users.
*/
+ if (folio_test_anon(old_folio))
+ folio_lock(old_folio);
if (folio_mapcount(old_folio) == 1 && folio_test_anon(old_folio)) {
if (!PageAnonExclusive(&old_folio->page)) {
folio_move_anon_rmap(old_folio, vma);
@@ -6212,6 +6214,8 @@ static vm_fault_t hugetlb_wp(struct vm_fault *vmf)
}
VM_BUG_ON_PAGE(folio_test_anon(old_folio) &&
PageAnonExclusive(&old_folio->page), &old_folio->page);
+ if (folio_test_anon(old_folio))
+ folio_unlock(old_folio);
/*
* If the process that created a MAP_PRIVATE mapping is about to perform
@@ -6537,11 +6541,6 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
}
new_pagecache_folio = true;
} else {
- /*
- * hugetlb_wp() expects the folio to be locked in order to
- * check whether we can re-use this page exclusively for us.
- */
- folio_lock(folio);
anon_rmap = 1;
}
} else {
@@ -6558,7 +6557,8 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
/* Check for page in userfault range. */
if (userfaultfd_minor(vma)) {
- folio_unlock(folio);
+ if (!anon_rmap)
+ folio_unlock(folio);
folio_put(folio);
/* See comment in userfaultfd_missing() block above */
if (!hugetlb_pte_stable(h, mm, vmf->address, vmf->pte, vmf->orig_pte)) {
@@ -6604,6 +6604,13 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
new_pte = huge_pte_mkuffd_wp(new_pte);
set_huge_pte_at(mm, vmf->address, vmf->pte, new_pte, huge_page_size(h));
+ /*
+ * This folio cannot have been truncated since we were holding the lock,
+ * and we just mapped it into the pagetables. Drop the lock now.
+ */
+ if (!anon_rmap)
+ folio_unlock(folio);
+
hugetlb_count_add(pages_per_huge_page(h), mm);
if ((vmf->flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) {
/* Optimization, do the COW without a second fault */
@@ -6619,8 +6626,6 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
*/
if (new_folio)
folio_set_hugetlb_migratable(folio);
-
- folio_unlock(folio);
out:
hugetlb_vma_unlock_read(vma);
@@ -6639,8 +6644,8 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping,
backout_unlocked:
if (new_folio && !new_pagecache_folio)
restore_reserve_on_error(h, vma, vmf->address, folio);
-
- folio_unlock(folio);
+ if (!anon_rmap)
+ folio_unlock(folio);
folio_put(folio);
goto out;
}
@@ -6805,21 +6810,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
/* Fallthrough to CoW */
}
- /*
- * We need to lock the folio before calling hugetlb_wp().
- * Either the folio is in the pagecache and we need to copy it over
- * to another file, so it must remain stable throughout the operation,
- * or the folio is anonymous and we need to lock it in order to check
- * whether we can re-use it and mark it exclusive for this process.
- * The timespan for the lock differs depending on the type, since
- * anonymous folios only need to hold the lock while checking whether we
- * can re-use it, while we need to hold it throughout the copy in case
- * we are dealing with a folio from a pagecache.
- * Representing this difference would be tricky with the current code,
- * so just hold the lock for the duration of hugetlb_wp().
- */
folio = page_folio(pte_page(vmf.orig_pte));
- folio_lock(folio);
folio_get(folio);
if (flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) {
@@ -6835,7 +6826,6 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
flags & FAULT_FLAG_WRITE))
update_mmu_cache(vma, vmf.address, vmf.pte);
out_put_page:
- folio_unlock(folio);
folio_put(folio);
out_ptl:
spin_unlock(vmf.ptl);
This should be patch#2 with something like "Sorting out locking" per
title, and maybe explaining a bit more why the lock in hugelb_wp for
anonymous folios.
What do you think?
--
Oscar Salvador
SUSE Labs
Powered by blists - more mailing lists