linux-kernel - Re: [RFC PATCH 1/3] mm, hugetlb: Clean up locking in hugetlb_fault and hugetlb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aEg9mTDDA1TXJ9by@localhost.localdomain>
Date: Tue, 10 Jun 2025 16:13:45 +0200
From: Oscar Salvador <osalvador@...e.de>
To: Peter Xu <peterx@...hat.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	Muchun Song <muchun.song@...ux.dev>,
	David Hildenbrand <david@...hat.com>,
	James Houghton <jthoughton@...gle.com>,
	Gavin Guo <gavinguo@...lia.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 1/3] mm, hugetlb: Clean up locking in hugetlb_fault
 and hugetlb_wp

On Tue, Jun 03, 2025 at 02:31:15PM -0400, Peter Xu wrote:
> > There're actually three places this patch touched, the 3rd one is
> > hugetlb_no_page(), in which case I also think we should lock it, not only
> > because file folios normally does it (see do_fault(), for example), but
> > also that's exactly what James mentioned I believe on possible race of
> > !uptodate hugetlb folio being injected by UFFDIO_CONTINUE, along the lines:
> > 
> > 		folio = alloc_hugetlb_folio(vma, vmf->address, false);
> >                 ...
> > 		folio_zero_user(folio, vmf->real_address);
> > 		__folio_mark_uptodate(folio);
> 
> I think I was wrong here at least partly..  So what this patch changed is
> only the lookup of the no_page path, hence what I said here doesn't apply.
> This patch also mentioned in the commit message on why James's concern was
> ok - the fault mutex was held.  Yes I agree. Actually even without fault
> mutex, the folio is only injected into page cache after mark uptodate.. so
> it looks fine even without the mutex.
> 
> Though it's still true there're three paths to be discussed, which should
> include no_page, and it's still needed to be discussed when any of us like
> to remove folio lock even in the lookup path.
> 
> For example, I'm not sure whether it's always thread safe to do
> folio_test_hwpoison() when without it, even with the fault mutex.
> 
> Sorry for the confusion, Oscar.

So, I'm having another go at this.
I started with replacing the check for cow-on-private-mappings [1].

I'm still to figure out how to approach the other locks though.
The thing is, we only need the lock in two cases:

1) folio is the original folio in the pagecache: We need to lock it to
   copy it over to another page (do_fault->do_cow_page locks the folio
   via filemap_fault).
   Although, the truth be told, we even lock the folio on read-fault in
   do_read_fault. Maybe that is just because __do_fault() ends up
   calling filemap_fault (so it doesn't differentiate?).
   But that is not a problem, just a note.
   So it is ok for hugetlb_no_page() to also take the lock when
   read-faulting.
   We need to take the lock here.

2) folio is anonymous: We need to take the lock when we want to check
   if we can re-use the folio in hugetlb_wp.
   Normal faulting path does it in wp_can_reuse_anon_folio().

Now, the scope for them is different.
Normal faulting path only locks the anonymous folio when calling
wp_can_reuse_anon_folio() (so, it isn't hold during the copy), while we
lock folios from the pagecache in filemap_fault() and keep the lock until
we have a) mapped it into our pagetables (read-fault) b) or we have
copied it over to another page.

Representing this difference of timespan in hugetlb is rather tricky as
is.
Maybe it could be done if we refactor hugetlb_{fault,no_page,wp} to look
more alike to the non-hugetlb faulting path.
E.g: hugetlb_fault could directly call hugetlb_wp if it sees that
FAULT_FLAG_WRITE is on, instead of doing a first pass to
hugetlb_no_page, and leave hugetlb_no_page for only RO stuff.
Not put much though on it, but just an idead of how do_fault() handles
it.

(In an ideal world hugetlb could be re-routed to generic faulting path
but we would need to sort out somehow the need to allocate folios,
reserves etc. so that is still far future)

Anyway, as I said, representing this different of timespan file vs
anonymous in hugetlb is tricky, so I think the current state of locks,
with [1] applied is fine.

hugetlb_fault:
               - locks the folio mapped in pte_orig
hugetlb_no_page:
               - locks the folio from the pagecache
	       - or allocates a new folio
	         - and inserts it into the pagecache and locks it
	         - and locks it (anonymous)

So, hugetlb_wp() already gets the folio locked and doesn't have to
bother about anything else.

Of course, while I think that locks can be keep as they are, I plan to
document them properly so the next poor soul that comes along doesn't
have to spend time trying to figure out why we do what we do.

Another thing of keeping the locks is that for the future we can
potentially rely less on the faulting mutex.

[1] https://github.com/leberus/linux/commit/b8e10b854fc3e427fd69d61d065798c7f31ebd55

-- 
Oscar Salvador
SUSE Labs