[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250513175633.85f4e19f4232a68ab04c8e41@linux-foundation.org>
Date: Tue, 13 May 2025 17:56:33 -0700
From: Andrew Morton <akpm@...ux-foundation.org>
To: Gavin Guo <gavinguo@...lia.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org, muchun.song@...ux.dev,
osalvador@...e.de, kernel-dev@...lia.com, stable@...r.kernel.org, Hugh
Dickins <hughd@...gle.com>, Florent Revest <revest@...gle.com>, Gavin Shan
<gshan@...hat.com>, Byungchul Park <byungchul@...com>
Subject: Re: [PATCH] mm/hugetlb: fix a deadlock with pagecache_folio and
hugetlb_fault_mutex_table
On Tue, 13 May 2025 17:34:48 +0800 Gavin Guo <gavinguo@...lia.com> wrote:
> The patch fixes a deadlock which can be triggered by an internal
> syzkaller [1] reproducer and captured by bpftrace script [2] and its log
> [3] in this scenario:
>
> Process 1 Process 2
> --- ---
> hugetlb_fault
> mutex_lock(B) // take B
> filemap_lock_hugetlb_folio
> filemap_lock_folio
> __filemap_get_folio
> folio_lock(A) // take A
> hugetlb_wp
> mutex_unlock(B) // release B
> ... hugetlb_fault
> ... mutex_lock(B) // take B
> filemap_lock_hugetlb_folio
> filemap_lock_folio
> __filemap_get_folio
> folio_lock(A) // blocked
> unmap_ref_private
> ...
> mutex_lock(B) // retake and blocked
>
> This is a ABBA deadlock involving two locks:
> - Lock A: pagecache_folio lock
> - Lock B: hugetlb_fault_mutex_table lock
Nostalgia. A decade or three ago many of us spent much of our lives
staring at ABBA deadlocks. Then came lockdep and after a few more
years, it all stopped. I've long hoped that lockdep would gain a
solution to custom locks such as folio_wait_bit_common(), but not yet.
Byungchul, please take a look. Would DEPT
(https://lkml.kernel.org/r/20250513100730.12664-1-byungchul@sk.com)
have warned us about this?
>
> ...
>
> The deadlock occurs between two processes as follows:
>
> ...
>
> Fixes: 40549ba8f8e0 ("hugetlb: use new vma_lock for pmd sharing synchronization")
> Cc: <stable@...r.kernel.org>
It's been there for three years so I assume we aren't in a hurry.
The fix looks a bit nasty, sorry. Perhaps designed for a minimal patch
footprint? That's good for a backportable fixup, but a more broadly
architected solution may be needed going forward.
I'll queue it for 6.16-rc1 with a cc:stable, so this should be
presented to the -stable trees 3-4 weeks from now.
Powered by blists - more mailing lists