[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250516060309.GA51921@system.software.com>
Date: Fri, 16 May 2025 15:03:09 +0900
From: Byungchul Park <byungchul@...com>
To: Gavin Guo <gavinguo@...lia.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org, muchun.song@...ux.dev,
osalvador@...e.de, akpm@...ux-foundation.org,
mike.kravetz@...cle.com, kernel-dev@...lia.com,
stable@...r.kernel.org, Hugh Dickins <hughd@...gle.com>,
Florent Revest <revest@...gle.com>, Gavin Shan <gshan@...hat.com>,
kernel_team@...ynix.com
Subject: Re: [PATCH] mm/hugetlb: fix a deadlock with pagecache_folio and
hugetlb_fault_mutex_table
On Wed, May 14, 2025 at 04:10:12PM +0800, Gavin Guo wrote:
> Hi Byungchul,
>
> On 5/14/25 14:47, Byungchul Park wrote:
> > On Tue, May 13, 2025 at 05:34:48PM +0800, Gavin Guo wrote:
> > > The patch fixes a deadlock which can be triggered by an internal
> > > syzkaller [1] reproducer and captured by bpftrace script [2] and its log
> >
> > Hi,
> >
> > I'm trying to reproduce using the test program [1]. But not yet
> > produced. I see a lot of segfaults while running [1]. I guess
> > something goes wrong. Is there any prerequisite condition to reproduce
> > it? Lemme know if any. Or can you try DEPT15 with your config and
> > environment by the following steps:
> >
> > 1. Apply the patchset on v6.15-rc6.
> > https://lkml.kernel.org/r/20250513100730.12664-1-byungchul@sk.com
> > 2. Turn on CONFIG_DEPT.
> > 3. Run test program reproducing the deadlock.
> > 4. Check dmesg to see if dept reported the dependency.
> >
> > Byungchul
>
> I have enabled the patchset and successfully reproduced the bug. It
> seems that there is no warning or error log related to the lock. Did I
> miss anything? This is the console log:
> https://drive.google.com/file/d/1dxWNiO71qE-H-e5NMPqj7W-aW5CkGSSF/view?usp=sharing
My bad. I think I found the problem that dept didn't report it. You
might see the report with the following patch applied on the top, there
might be a lot of false positives along with that might be annoying tho.
Some of my efforts to suppress false positives, suppressed the real one.
Do you mind if I ask you to run the test with the following patch
applied? It'd be appreciated if you do and share the result with me.
Byungchul
---
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index f31cd68f2935..fd7559e663c5 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -1138,6 +1138,7 @@ static inline bool trylock_page(struct page *page)
static inline void folio_lock(struct folio *folio)
{
might_sleep();
+ dept_page_wait_on_bit(&folio->page, PG_locked);
if (!folio_trylock(folio))
__folio_lock(folio);
}
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index b2fa96d984bc..4e96a6a72d02 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -931,7 +931,6 @@ static void print_circle(struct dept_class *c)
dept_outworld_exit();
do {
- tc->reported = true;
tc = fc;
fc = fc->bfs_parent;
} while (tc != c);
diff --git a/kernel/dependency/dept_unit_test.c b/kernel/dependency/dept_unit_test.c
index 88e846b9f876..496149f31fb3 100644
--- a/kernel/dependency/dept_unit_test.c
+++ b/kernel/dependency/dept_unit_test.c
@@ -125,6 +125,8 @@ static int __init dept_ut_init(void)
{
int i;
+ return 0;
+
lockdep_off();
dept_ut_results.ecxt_stack_valid_cnt = 0;
--
Powered by blists - more mailing lists