[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250516074336.GA42829@system.software.com>
Date: Fri, 16 May 2025 16:43:36 +0900
From: Byungchul Park <byungchul@...com>
To: Gavin Guo <gavinguo@...lia.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org, muchun.song@...ux.dev,
osalvador@...e.de, akpm@...ux-foundation.org,
mike.kravetz@...cle.com, kernel-dev@...lia.com,
stable@...r.kernel.org, Hugh Dickins <hughd@...gle.com>,
Florent Revest <revest@...gle.com>, Gavin Shan <gshan@...hat.com>,
kernel_team@...ynix.com
Subject: Re: [PATCH] mm/hugetlb: fix a deadlock with pagecache_folio and
hugetlb_fault_mutex_table
On Fri, May 16, 2025 at 03:32:35PM +0800, Gavin Guo wrote:
> On 5/16/25 14:03, Byungchul Park wrote:
> > On Wed, May 14, 2025 at 04:10:12PM +0800, Gavin Guo wrote:
> > > Hi Byungchul,
> > >
> > > On 5/14/25 14:47, Byungchul Park wrote:
> > > > On Tue, May 13, 2025 at 05:34:48PM +0800, Gavin Guo wrote:
> > > > > The patch fixes a deadlock which can be triggered by an internal
> > > > > syzkaller [1] reproducer and captured by bpftrace script [2] and its log
> > > >
> > > > Hi,
> > > >
> > > > I'm trying to reproduce using the test program [1]. But not yet
> > > > produced. I see a lot of segfaults while running [1]. I guess
> > > > something goes wrong. Is there any prerequisite condition to reproduce
> > > > it? Lemme know if any. Or can you try DEPT15 with your config and
> > > > environment by the following steps:
> > > >
> > > > 1. Apply the patchset on v6.15-rc6.
> > > > https://lkml.kernel.org/r/20250513100730.12664-1-byungchul@sk.com
> > > > 2. Turn on CONFIG_DEPT.
> > > > 3. Run test program reproducing the deadlock.
> > > > 4. Check dmesg to see if dept reported the dependency.
> > > >
> > > > Byungchul
> > >
> > > I have enabled the patchset and successfully reproduced the bug. It
> > > seems that there is no warning or error log related to the lock. Did I
> > > miss anything? This is the console log:
> > > https://drive.google.com/file/d/1dxWNiO71qE-H-e5NMPqj7W-aW5CkGSSF/view?usp=sharing
> >
> > My bad. I think I found the problem that dept didn't report it. You
> > might see the report with the following patch applied on the top, there
> > might be a lot of false positives along with that might be annoying tho.
> >
> > Some of my efforts to suppress false positives, suppressed the real one.
> >
> > Do you mind if I ask you to run the test with the following patch
> > applied? It'd be appreciated if you do and share the result with me.
> >
> > Byungchul
> >
> > ---
> > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> > index f31cd68f2935..fd7559e663c5 100644
> > --- a/include/linux/pagemap.h
> > +++ b/include/linux/pagemap.h
> > @@ -1138,6 +1138,7 @@ static inline bool trylock_page(struct page *page)
> > static inline void folio_lock(struct folio *folio)
> > {
> > might_sleep();
> > + dept_page_wait_on_bit(&folio->page, PG_locked);
> > if (!folio_trylock(folio))
> > __folio_lock(folio);
> > }
> > diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
> > index b2fa96d984bc..4e96a6a72d02 100644
> > --- a/kernel/dependency/dept.c
> > +++ b/kernel/dependency/dept.c
> > @@ -931,7 +931,6 @@ static void print_circle(struct dept_class *c)
> > dept_outworld_exit();
> > do {
> > - tc->reported = true;
> > tc = fc;
> > fc = fc->bfs_parent;
> > } while (tc != c);
> > diff --git a/kernel/dependency/dept_unit_test.c b/kernel/dependency/dept_unit_test.c
> > index 88e846b9f876..496149f31fb3 100644
> > --- a/kernel/dependency/dept_unit_test.c
> > +++ b/kernel/dependency/dept_unit_test.c
> > @@ -125,6 +125,8 @@ static int __init dept_ut_init(void)
> > {
> > int i;
> > + return 0;
> > +
> > lockdep_off();
> > dept_ut_results.ecxt_stack_valid_cnt = 0;
> > --
>
> Please see the test result:
> https://drive.google.com/file/d/1B20Gu3wLFbAeaXXb7aSQP5T6aeN9Mext/view?usp=sharing
>
> It seems that after the first round, the deadlock is captured:
Thank you for the testing again!
Yeah, dept works well as I expected. I shouldn't have suppressed dept
reports too aggressively, but.. I (or we if any) need to deal with the
existing false positives one by one by using dept annotations.
Thanks again for confirming it.
Byungchul
> ubuntu@...alhost:~$ ./repro_20250402_0225_154f8fb0580000
> executing program
> [ 80.425842][ T3416] ===================================================
> [ 80.426707][ T3416] DEPT: Circular dependency has been detected.
> [ 80.427497][ T3416] 6.15.0-rc6+ #31 Not tainted
> [ 80.428084][ T3416] ---------------------------------------------------
> [ 80.428964][ T3416] summary
> [ 80.429330][ T3416] ---------------------------------------------------
> [ 80.430078][ T3416] *** DEADLOCK ***
> [ 80.430078][ T3416]
> [ 80.430736][ T3416] context A
> [ 80.431076][ T3416] [S] (unknown)(pg_locked_map:0)
> [ 80.431637][ T3416] [W] lock(&hugetlb_fault_mutex_table[i]:0)
> [ 80.432312][ T3416] [E] dept_page_clear_bit(pg_locked_map:0)
> [ 80.432977][ T3416]
> [ 80.433246][ T3416] context B
> [ 80.433595][ T3416] [S] lock(&hugetlb_fault_mutex_table[i]:0)
> [ 80.434245][ T3416] [W] dept_page_wait_on_bit(pg_locked_map:0)
> [ 80.434880][ T3416] [E] unlock(&hugetlb_fault_mutex_table[i]:0)
> [ 80.435592][ T3416]
> [ 80.435852][ T3416] [S]: start of the event context
> [ 80.436369][ T3416] [W]: the wait blocked
> [ 80.436789][ T3416] [E]: the event not reachable
> [ 80.437275][ T3416] ---------------------------------------------------
> [ 80.437950][ T3416] context A's detail
> [ 80.438367][ T3416] ---------------------------------------------------
> [ 80.439006][ T3416] context A
> [ 80.439337][ T3416] [S] (unknown)(pg_locked_map:0)
> [ 80.439883][ T3416] [W] lock(&hugetlb_fault_mutex_table[i]:0)
> [ 80.440489][ T3416] [E] dept_page_clear_bit(pg_locked_map:0)
> [ 80.441075][ T3416]
> [ 80.441318][ T3416] [S] (unknown)(pg_locked_map:0):
> [ 80.441816][ T3416] (N/A)
> [ 80.442077][ T3416]
> [ 80.442309][ T3416] [W] lock(&hugetlb_fault_mutex_table[i]:0):
> [ 80.442872][ T3416] [<ffffffff82144644>] hugetlb_wp+0xfa4/0x3490
> [ 80.443502][ T3416] stacktrace:
> [ 80.443810][ T3416] hugetlb_wp+0xfa4/0x3490
> [ 80.444267][ T3416] hugetlb_fault+0x1505/0x2c70
> [ 80.444776][ T3416] handle_mm_fault+0x1845/0x1ab0
> [ 80.445275][ T3416] do_user_addr_fault+0x637/0x1450
> [ 80.445779][ T3416] exc_page_fault+0x67/0x110
> [ 80.446239][ T3416] asm_exc_page_fault+0x26/0x30
> [ 80.446722][ T3416] __put_user_4+0xd/0x20
> [ 80.447157][ T3416] copy_process+0x1f64/0x3d80
> [ 80.447621][ T3416] kernel_clone+0x216/0x940
> [ 80.448068][ T3416] __x64_sys_clone+0x18d/0x1f0
> [ 80.448548][ T3416] do_syscall_64+0x6f/0x120
> [ 80.448999][ T3416] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 80.449556][ T3416]
> [ 80.449765][ T3416] [E] dept_page_clear_bit(pg_locked_map:0):
> [ 80.450272][ T3416] [<ffffffff8214263b>] hugetlb_fault+0x1ccb/0x2c70
> [ 80.450861][ T3416] stacktrace:
> [ 80.451148][ T3416] hugetlb_fault+0x1ccb/0x2c70
> [ 80.451611][ T3416] handle_mm_fault+0x1845/0x1ab0
> [ 80.452080][ T3416] do_user_addr_fault+0x637/0x1450
> [ 80.452566][ T3416] exc_page_fault+0x67/0x110
> [ 80.453014][ T3416] asm_exc_page_fault+0x26/0x30
> [ 80.453497][ T3416] __put_user_4+0xd/0x20
> [ 80.453923][ T3416] copy_process+0x1f64/0x3d80
> [ 80.454379][ T3416] kernel_clone+0x216/0x940
> [ 80.454817][ T3416] __x64_sys_clone+0x18d/0x1f0
> [ 80.455277][ T3416] do_syscall_64+0x6f/0x120
> [ 80.455722][ T3416] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 80.456253][ T3416] ---------------------------------------------------
> [ 80.456842][ T3416] context B's detail
> [ 80.457198][ T3416] ---------------------------------------------------
> [ 80.457842][ T3416] context B
> [ 80.458122][ T3416] [S] lock(&hugetlb_fault_mutex_table[i]:0)
> [ 80.458661][ T3416] [W] dept_page_wait_on_bit(pg_locked_map:0)
> [ 80.459187][ T3416] [E] unlock(&hugetlb_fault_mutex_table[i]:0)
> [ 80.459763][ T3416]
> [ 80.459988][ T3416] [S] lock(&hugetlb_fault_mutex_table[i]:0):
> [ 80.460509][ T3416] [<ffffffff82140d36>] hugetlb_fault+0x3c6/0x2c70
> [ 80.461074][ T3416] stacktrace:
> [ 80.461374][ T3416] hugetlb_fault+0x3c6/0x2c70
> [ 80.461812][ T3416] handle_mm_fault+0x1845/0x1ab0
> [ 80.462281][ T3416] do_user_addr_fault+0x637/0x1450
> [ 80.462775][ T3416] exc_page_fault+0x67/0x110
> [ 80.463220][ T3416] asm_exc_page_fault+0x26/0x30
> [ 80.463694][ T3416] __put_user_4+0xd/0x20
> [ 80.464129][ T3416] copy_process+0x1f64/0x3d80
> [ 80.464577][ T3416] kernel_clone+0x216/0x940
> [ 80.464994][ T3416] __x64_sys_clone+0x18d/0x1f0
> [ 80.465466][ T3416] do_syscall_64+0x6f/0x120
> [ 80.465909][ T3416] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 80.466457][ T3416]
> [ 80.466660][ T3416] [W] dept_page_wait_on_bit(pg_locked_map:0):
> [ 80.467177][ T3416] [<ffffffff82141187>] hugetlb_fault+0x817/0x2c70
> [ 80.467740][ T3416] stacktrace:
> [ 80.468032][ T3416] hugetlb_fault+0x817/0x2c70
> [ 80.468479][ T3416] handle_mm_fault+0x1845/0x1ab0
> [ 80.468947][ T3416] do_user_addr_fault+0x637/0x1450
> [ 80.469428][ T3416] exc_page_fault+0x67/0x110
> [ 80.469865][ T3416] asm_exc_page_fault+0x26/0x30
> [ 80.470332][ T3416] __put_user_4+0xd/0x20
> [ 80.470742][ T3416] copy_process+0x1f64/0x3d80
> [ 80.471186][ T3416] kernel_clone+0x216/0x940
> [ 80.471616][ T3416] __x64_sys_clone+0x18d/0x1f0
> [ 80.472060][ T3416] do_syscall_64+0x6f/0x120
> [ 80.472492][ T3416] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 80.473040][ T3416]
> [ 80.473271][ T3416] [E] unlock(&hugetlb_fault_mutex_table[i]:0):
> [ 80.473863][ T3416] (N/A)
> [ 80.474124][ T3416] ---------------------------------------------------
> [ 80.474738][ T3416] information that might be helpful
> [ 80.475210][ T3416] ---------------------------------------------------
> [ 80.475820][ T3416] CPU: 1 UID: 1000 PID: 3416 Comm: repro_20250402_ Not
> tainted 6.15.0-rc6+ #31 NONE
> [ 80.475831][ T3416] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> [ 80.475837][ T3416] Call Trace:
> [ 80.475841][ T3416] <TASK>
> [ 80.475845][ T3416] dump_stack_lvl+0x1ad/0x280
> [ 80.475858][ T3416] ? __pfx_dump_stack_lvl+0x10/0x10
> [ 80.475867][ T3416] ? __pfx__printk+0x10/0x10
> [ 80.475883][ T3416] cb_check_dl+0x24a8/0x2530
> [ 80.475897][ T3416] ? bfs_extend_dep+0x271/0x290
> [ 80.475909][ T3416] bfs+0x464/0x5e0
> [ 80.475921][ T3416] ? __pfx_bfs+0x10/0x10
> [ 80.475931][ T3416] ? add_dep+0x387/0x710
> [ 80.475943][ T3416] add_dep+0x3d0/0x710
> [ 80.475953][ T3416] ? __pfx_from_pool+0x10/0x10
> [ 80.475963][ T3416] ? __pfx_bfs_init_check_dl+0x10/0x10
> [ 80.475972][ T3416] ? __pfx_bfs_extend_dep+0x10/0x10
> [ 80.475981][ T3416] ? __pfx_bfs_dequeue_dep+0x10/0x10
> [ 80.475990][ T3416] ? __pfx_cb_check_dl+0x10/0x10
> [ 80.475999][ T3416] ? __pfx_add_dep+0x10/0x10
> [ 80.476011][ T3416] ? put_ecxt+0xda/0x4b0
> [ 80.476024][ T3416] __dept_event+0xee8/0x1590
> [ 80.476038][ T3416] dept_event+0x166/0x240
> [ 80.476047][ T3416] ? hugetlb_fault+0x1ccb/0x2c70
> [ 80.476057][ T3416] folio_unlock+0xb8/0x190
> [ 80.476071][ T3416] hugetlb_fault+0x1ccb/0x2c70
> [ 80.476085][ T3416] ? __pfx_hugetlb_fault+0x10/0x10
> [ 80.476100][ T3416] ? mt_find+0x15a/0x5f0
> [ 80.476110][ T3416] handle_mm_fault+0x1845/0x1ab0
> [ 80.476125][ T3416] ? handle_mm_fault+0xdb/0x1ab0
> [ 80.476142][ T3416] ? __pfx_handle_mm_fault+0x10/0x10
> [ 80.476156][ T3416] ? find_vma+0xec/0x160
> [ 80.476164][ T3416] ? __pfx_find_vma+0x10/0x10
> [ 80.476172][ T3416] ? dept_on+0x1c/0x30
> [ 80.476179][ T3416] ? dept_exit+0x1c5/0x2c0
> [ 80.476186][ T3416] ? lockdep_hardirqs_on_prepare+0x21/0x280
> [ 80.476197][ T3416] ? lock_mm_and_find_vma+0xa1/0x300
> [ 80.476211][ T3416] do_user_addr_fault+0x637/0x1450
> [ 80.476219][ T3416] ? mntput_no_expire+0xc0/0x870
> [ 80.476235][ T3416] ? __pfx_do_user_addr_fault+0x10/0x10
> [ 80.476246][ T3416] ? trace_irq_disable+0x60/0x180
> [ 80.476258][ T3416] exc_page_fault+0x67/0x110
> [ 80.476272][ T3416] asm_exc_page_fault+0x26/0x30
> [ 80.476280][ T3416] RIP: 0010:__put_user_4+0xd/0x20
> [ 80.476293][ T3416] Code: 66 89 01 31 c9 0f 1f 00 c3 cc cc cc cc 90 90 90
> 90 90 90 90 90 90 90 90 90 90 90 90 90 48 89 cb 48 c1 fb 3f 48 09 d9 0f 1f
> 00 <89> 01 31 c9 0
> [ 80.476312][ T3416] RSP: 0018:ffffc90004dffa38 EFLAGS: 00010206
> [ 80.476322][ T3416] RAX: 000000000000000c RBX: 0000000000000000 RCX:
> 0000200000000200
> [ 80.476329][ T3416] RDX: 0000000000000000 RSI: ffff888016abe300 RDI:
> ffff888017878c20
> [ 80.476335][ T3416] RBP: ffffc90004dffc10 R08: 0000000000000000 R09:
> 0000000000000000
> [ 80.476340][ T3416] R10: 0000000000000000 R11: ffffffff82034b65 R12:
> ffff888017c0a1e8
> [ 80.476346][ T3416] R13: ffff88800d6a8200 R14: 0000000000000000 R15:
> ffff888017c08a38
> [ 80.476354][ T3416] ? __might_fault+0xb5/0x130
> [ 80.476367][ T3416] copy_process+0x1f64/0x3d80
> [ 80.476375][ T3416] ? lockdep_hardirqs_on_prepare+0x21/0x280
> [ 80.476388][ T3416] ? copy_process+0x996/0x3d80
> [ 80.476399][ T3416] ? __pfx_copy_process+0x10/0x10
> [ 80.476406][ T3416] ? from_pool+0x1e1/0x750
> [ 80.476416][ T3416] ? handle_mm_fault+0x122e/0x1ab0
> [ 80.476432][ T3416] kernel_clone+0x216/0x940
> [ 80.476440][ T3416] ? __pfx_llist_del_first+0x10/0x10
> [ 80.476448][ T3416] ? check_new_class+0x28a/0xe90
> [ 80.476458][ T3416] ? __pfx_kernel_clone+0x10/0x10
> [ 80.476468][ T3416] ? from_pool+0x1e1/0x750
> [ 80.476478][ T3416] ? __pfx_from_pool+0x10/0x10
> [ 80.476487][ T3416] ? __pfx_from_pool+0x10/0x10
> [ 80.476502][ T3416] __x64_sys_clone+0x18d/0x1f0
> [ 80.476512][ T3416] ? __pfx___x64_sys_clone+0x10/0x10
> [ 80.476520][ T3416] ? llist_add_batch+0x111/0x1f0
> [ 80.476532][ T3416] ? dept_task+0x5/0x20
> [ 80.476539][ T3416] ? dept_on+0x1c/0x30
> [ 80.476545][ T3416] ? dept_exit+0x1c5/0x2c0
> [ 80.476553][ T3416] ? lockdep_hardirqs_on_prepare+0x21/0x280
> [ 80.476565][ T3416] do_syscall_64+0x6f/0x120
> [ 80.476573][ T3416] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 80.476580][ T3416] RIP: 0033:0x41b26d
> [ 80.476588][ T3416] Code: b3 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e
> fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f
> 05 <48> 3d 01 f0 8
> [ 80.476595][ T3416] RSP: 002b:00007ffa1ad2d198 EFLAGS: 00000206 ORIG_RAX:
> 0000000000000038
> [ 80.476604][ T3416] RAX: ffffffffffffffda RBX: 00007ffa1ad2dcdc RCX:
> 000000000041b26d
> [ 80.476610][ T3416] RDX: 0000200000000200 RSI: 0000000000000000 RDI:
> 0000000000001200
> [ 80.476616][ T3416] RBP: 00007ffa1ad2d1e0 R08: 0000000000000000 R09:
> 0000000000000000
> [ 80.476621][ T3416] R10: 0000000000000000 R11: 0000000000000206 R12:
> 00007ffa1ad2d6c0
> [ 80.476626][ T3416] R13: ffffffffffffffb8 R14: 0000000000000002 R15:
> 00007ffd95d76940
> [ 80.476638][ T3416] </TASK>
Powered by blists - more mailing lists