linux-kernel - Re: [PATCH] mm/hugetlb: fix a deadlock with pagecache_folio and hugetlb_fault_mutex

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250514043326.GA4318@system.software.com>
Date: Wed, 14 May 2025 13:33:27 +0900
From: Byungchul Park <byungchul@...com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Gavin Guo <gavinguo@...lia.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, muchun.song@...ux.dev,
	osalvador@...e.de, kernel-dev@...lia.com, stable@...r.kernel.org,
	Hugh Dickins <hughd@...gle.com>, Florent Revest <revest@...gle.com>,
	Gavin Shan <gshan@...hat.com>, kernel_team@...ynix.com
Subject: Re: [PATCH] mm/hugetlb: fix a deadlock with pagecache_folio and
 hugetlb_fault_mutex_table

On Tue, May 13, 2025 at 05:56:33PM -0700, Andrew Morton wrote:
> On Tue, 13 May 2025 17:34:48 +0800 Gavin Guo <gavinguo@...lia.com> wrote:
> 
> > The patch fixes a deadlock which can be triggered by an internal
> > syzkaller [1] reproducer and captured by bpftrace script [2] and its log
> > [3] in this scenario:
> > 
> > Process 1                              Process 2
> > ---				       ---
> > hugetlb_fault
> >   mutex_lock(B) // take B
> >   filemap_lock_hugetlb_folio
> >     filemap_lock_folio
> >       __filemap_get_folio
> >         folio_lock(A) // take A
> >   hugetlb_wp
> >     mutex_unlock(B) // release B
> >     ...                                hugetlb_fault
> >     ...                                  mutex_lock(B) // take B
> >                                          filemap_lock_hugetlb_folio
> >                                            filemap_lock_folio
> >                                              __filemap_get_folio
> >                                                folio_lock(A) // blocked
> >     unmap_ref_private
> >     ...
> >     mutex_lock(B) // retake and blocked
> > 
> > This is a ABBA deadlock involving two locks:
> > - Lock A: pagecache_folio lock
> > - Lock B: hugetlb_fault_mutex_table lock
> 
> Nostalgia.  A decade or three ago many of us spent much of our lives
> staring at ABBA deadlocks.  Then came lockdep and after a few more
> years, it all stopped.  I've long hoped that lockdep would gain a
> solution to custom locks such as folio_wait_bit_common(), but not yet.
> 
> Byungchul, please take a look.  Would DEPT
> (https://lkml.kernel.org/r/20250513100730.12664-1-byungchul@sk.com)
> have warned us about this?

Sure, I will check it.  I think this type of deadlock is what DEPT can do
the best.

	Byungchul

> >
> > ...
> >
> > The deadlock occurs between two processes as follows:
> >
> > ...
> > 
> > Fixes: 40549ba8f8e0 ("hugetlb: use new vma_lock for pmd sharing synchronization")
> > Cc: <stable@...r.kernel.org>
> 
> It's been there for three years so I assume we aren't in a hurry.
> 
> The fix looks a bit nasty, sorry.  Perhaps designed for a minimal patch
> footprint?  That's good for a backportable fixup, but a more broadly
> architected solution may be needed going forward.
> 
> I'll queue it for 6.16-rc1 with a cc:stable, so this should be
> presented to the -stable trees 3-4 weeks from now.