[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aOTG7VTk4s9WfrMN@e129823.arm.com>
Date: Tue, 7 Oct 2025 08:53:17 +0100
From: Yeoreum Yun <yeoreum.yun@....com>
To: David Hildenbrand <david@...hat.com>
Cc: Yunseong Kim <ysk@...lloc.com>, Byungchul Park <byungchul@...com>,
Hillf Danton <hdanton@...a.com>, akpm@...ux-foundation.org,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
kernel_team@...ynix.com
Subject: Re: [RFC] mm/migrate: make sure folio_unlock() before
folio_wait_writeback()
Hi David,
> On 07.10.25 08:32, Yunseong Kim wrote:
> > Hi Hillf,
> >
> > Here are the syzlang and kernel log, and you can also find the gist snippet
> > in the body of the first RFC mail:
> >
> > https://gist.github.com/kzall0c/a6091bb2fd536865ca9aabfd017a1fc5
> >
> > I am reviewing this issue again on the v6.17, The issue is always reproducible,
> > usually occurring within about 10k attempts with the 8 procs.
>
> I can see a DEPT splat and I wonder what happens if DEPT is disabled.
>
> Will the machine actually deadlock or is this just DEPT complaining (and
> probably getting something wrong)?
>
As Pedro mention[0], I believe this DEPT splat is a false positive.
The folio targeted by __find_get_block_slow() belongs to bd_mapping,
which is not the same folio whose writeback flag gets cleared
in ext4_end_io_end().
Since DEPT currently does not distinguish regular-file data folios from
the corresponding block-device folios,
such false positives are a known issue, and we plan to fix it.
Also, when i see the log shared from Yunseong (in hung.log)
I can check the migration is stuck while waiting buffer_head lock:
...
[ 3123.713542][ T89] INFO: task syz.4.2628:42733 blocked for more than 143 seconds.
[ 3123.713550][ T89] Not tainted 6.15.11-00046-g2c223fa7bd9a-dirty #13
[ 3123.713557][ T89] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3123.713562][ T89] task:syz.4.2628 state:D stack:0 pid:42733 tgid:42732 ppid:41804 task_flags:0x400040 flags:0x00000009
[ 3123.713577][ T89] Call trace:
[ 3123.713582][ T89] __switch_to+0x19c/0x2c0 (T)
[ 3123.713598][ T89] __schedule+0x514/0x1208
[ 3123.713614][ T89] schedule+0x40/0x164
[ 3123.713629][ T89] io_schedule+0x3c/0x5c
[ 3123.713644][ T89] bit_wait_io+0x14/0x70
[ 3123.713662][ T89] __wait_on_bit_lock+0xa0/0x120
[ 3123.713678][ T89] out_of_line_wait_on_bit_lock+0x8c/0xc0
[ 3123.713695][ T89] __lock_buffer+0x74/0xb8
[ 3123.713720][ T89] __buffer_migrate_folio+0x190/0x504
[ 3123.713747][ T89] buffer_migrate_folio_norefs+0x30/0x3c
[ 3123.713764][ T89] move_to_new_folio+0xe4/0x528
[ 3123.713779][ T89] migrate_pages_batch+0xee0/0x1788
[ 3123.713795][ T89] migrate_pages+0x15c4/0x1840
[ 3123.713810][ T89] compact_zone+0x9c8/0x1d20
[ 3123.713822][ T89] compact_node+0xd4/0x27c
[ 3123.713832][ T89] sysctl_compaction_handler+0x104/0x194
[ 3123.713843][ T89] proc_sys_call_handler+0x25c/0x3f8
[ 3123.713865][ T89] proc_sys_write+0x20/0x2c
[ 3123.713878][ T89] do_iter_readv_writev+0x350/0x448
[ 3123.713897][ T89] vfs_writev+0x1ac/0x44c
[ 3123.713913][ T89] do_pwritev+0x100/0x15c
[ 3123.713929][ T89] __arm64_sys_pwritev2+0x6c/0xcc
[ 3123.713945][ T89] invoke_syscall.constprop.0+0x64/0x18c
[ 3123.713961][ T89] el0_svc_common.constprop.0+0x80/0x198
[ 3123.713978][ T89] do_el0_svc+0x28/0x3c
[ 3123.713993][ T89] el0_svc+0x50/0x220
[ 3123.714004][ T89] el0t_64_sync_handler+0x10c/0x140
[ 3123.714017][ T89] el0t_64_sync+0x1b8/0x1bc
...
which is different from description "stuck on writeback".
Unfortunately, I couldn't analyse more with the log he shared
since it was truncated.
@Yunseong, Could you make a reproduce without DEPT and share
full log for futher analysis?
Thanks.
[0] https://lore.kernel.org/all/dglxbwe2i5ubofefdxwo5jvyhdfjov37z5jzc5guedhe4dl6ia@pmkjkec3isb4/
--
Sincerely,
Yeoreum Yun
Powered by blists - more mailing lists