linux-kernel - Re: [PATCH] mm/hugetlb: fix a deadlock with pagecache_folio and hugetlb_fault_mutex

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20250513175633.85f4e19f4232a68ab04c8e41@linux-foundation.org>
Date: Tue, 13 May 2025 17:56:33 -0700
From: Andrew Morton <akpm@...ux-foundation.org>
To: Gavin Guo <gavinguo@...lia.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org, muchun.song@...ux.dev,
 osalvador@...e.de, kernel-dev@...lia.com, stable@...r.kernel.org, Hugh
 Dickins <hughd@...gle.com>, Florent Revest <revest@...gle.com>, Gavin Shan
 <gshan@...hat.com>, Byungchul Park <byungchul@...com>
Subject: Re: [PATCH] mm/hugetlb: fix a deadlock with pagecache_folio and
 hugetlb_fault_mutex_table

On Tue, 13 May 2025 17:34:48 +0800 Gavin Guo <gavinguo@...lia.com> wrote:

> The patch fixes a deadlock which can be triggered by an internal
> syzkaller [1] reproducer and captured by bpftrace script [2] and its log
> [3] in this scenario:
> 
> Process 1                              Process 2
> ---				       ---
> hugetlb_fault
>   mutex_lock(B) // take B
>   filemap_lock_hugetlb_folio
>     filemap_lock_folio
>       __filemap_get_folio
>         folio_lock(A) // take A
>   hugetlb_wp
>     mutex_unlock(B) // release B
>     ...                                hugetlb_fault
>     ...                                  mutex_lock(B) // take B
>                                          filemap_lock_hugetlb_folio
>                                            filemap_lock_folio
>                                              __filemap_get_folio
>                                                folio_lock(A) // blocked
>     unmap_ref_private
>     ...
>     mutex_lock(B) // retake and blocked
> 
> This is a ABBA deadlock involving two locks:
> - Lock A: pagecache_folio lock
> - Lock B: hugetlb_fault_mutex_table lock

Nostalgia.  A decade or three ago many of us spent much of our lives
staring at ABBA deadlocks.  Then came lockdep and after a few more
years, it all stopped.  I've long hoped that lockdep would gain a
solution to custom locks such as folio_wait_bit_common(), but not yet.

Byungchul, please take a look.  Would DEPT
(https://lkml.kernel.org/r/20250513100730.12664-1-byungchul@sk.com)
have warned us about this?

>
> ...
>
> The deadlock occurs between two processes as follows:
>
> ...
> 
> Fixes: 40549ba8f8e0 ("hugetlb: use new vma_lock for pmd sharing synchronization")
> Cc: <stable@...r.kernel.org>

It's been there for three years so I assume we aren't in a hurry.

The fix looks a bit nasty, sorry.  Perhaps designed for a minimal patch
footprint?  That's good for a backportable fixup, but a more broadly
architected solution may be needed going forward.

I'll queue it for 6.16-rc1 with a cc:stable, so this should be
presented to the -stable trees 3-4 weeks from now.