linux-kernel - Re: [RFC] mm/migrate: make sure folio_unlock() before folio_wait

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aOTG7VTk4s9WfrMN@e129823.arm.com>
Date: Tue, 7 Oct 2025 08:53:17 +0100
From: Yeoreum Yun <yeoreum.yun@....com>
To: David Hildenbrand <david@...hat.com>
Cc: Yunseong Kim <ysk@...lloc.com>, Byungchul Park <byungchul@...com>,
	Hillf Danton <hdanton@...a.com>, akpm@...ux-foundation.org,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	kernel_team@...ynix.com
Subject: Re: [RFC] mm/migrate: make sure folio_unlock() before
 folio_wait_writeback()

Hi David,

> On 07.10.25 08:32, Yunseong Kim wrote:
> > Hi Hillf,
> >
> > Here are the syzlang and kernel log, and you can also find the gist snippet
> > in the body of the first RFC mail:
> >
> >   https://gist.github.com/kzall0c/a6091bb2fd536865ca9aabfd017a1fc5
> >
> > I am reviewing this issue again on the v6.17, The issue is always reproducible,
> > usually occurring within about 10k attempts with the 8 procs.
>
> I can see a DEPT splat and I wonder what happens if DEPT is disabled.
>
> Will the machine actually deadlock or is this just DEPT complaining (and
> probably getting something wrong)?
>

As Pedro mention[0], I believe this DEPT splat is a false positive.
The folio targeted by __find_get_block_slow() belongs to bd_mapping,
which is not the same folio whose writeback flag gets cleared
in ext4_end_io_end().

Since DEPT currently does not distinguish regular-file data folios from
the corresponding block-device folios,
such false positives are a known issue, and we plan to fix it.

Also, when i see the log shared from Yunseong (in hung.log)
I can check the migration is stuck while waiting buffer_head lock:
...
[ 3123.713542][   T89] INFO: task syz.4.2628:42733 blocked for more than 143 seconds.
[ 3123.713550][   T89]       Not tainted 6.15.11-00046-g2c223fa7bd9a-dirty #13
[ 3123.713557][   T89] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3123.713562][   T89] task:syz.4.2628      state:D stack:0     pid:42733 tgid:42732 ppid:41804  task_flags:0x400040 flags:0x00000009
[ 3123.713577][   T89] Call trace:
[ 3123.713582][   T89]  __switch_to+0x19c/0x2c0 (T)
[ 3123.713598][   T89]  __schedule+0x514/0x1208
[ 3123.713614][   T89]  schedule+0x40/0x164
[ 3123.713629][   T89]  io_schedule+0x3c/0x5c
[ 3123.713644][   T89]  bit_wait_io+0x14/0x70
[ 3123.713662][   T89]  __wait_on_bit_lock+0xa0/0x120
[ 3123.713678][   T89]  out_of_line_wait_on_bit_lock+0x8c/0xc0
[ 3123.713695][   T89]  __lock_buffer+0x74/0xb8
[ 3123.713720][   T89]  __buffer_migrate_folio+0x190/0x504
[ 3123.713747][   T89]  buffer_migrate_folio_norefs+0x30/0x3c
[ 3123.713764][   T89]  move_to_new_folio+0xe4/0x528
[ 3123.713779][   T89]  migrate_pages_batch+0xee0/0x1788
[ 3123.713795][   T89]  migrate_pages+0x15c4/0x1840
[ 3123.713810][   T89]  compact_zone+0x9c8/0x1d20
[ 3123.713822][   T89]  compact_node+0xd4/0x27c
[ 3123.713832][   T89]  sysctl_compaction_handler+0x104/0x194
[ 3123.713843][   T89]  proc_sys_call_handler+0x25c/0x3f8
[ 3123.713865][   T89]  proc_sys_write+0x20/0x2c
[ 3123.713878][   T89]  do_iter_readv_writev+0x350/0x448
[ 3123.713897][   T89]  vfs_writev+0x1ac/0x44c
[ 3123.713913][   T89]  do_pwritev+0x100/0x15c
[ 3123.713929][   T89]  __arm64_sys_pwritev2+0x6c/0xcc
[ 3123.713945][   T89]  invoke_syscall.constprop.0+0x64/0x18c
[ 3123.713961][   T89]  el0_svc_common.constprop.0+0x80/0x198
[ 3123.713978][   T89]  do_el0_svc+0x28/0x3c
[ 3123.713993][   T89]  el0_svc+0x50/0x220
[ 3123.714004][   T89]  el0t_64_sync_handler+0x10c/0x140
[ 3123.714017][   T89]  el0t_64_sync+0x1b8/0x1bc
...

which is different from description "stuck on writeback".

Unfortunately, I couldn't analyse more with the log he shared
since it was truncated.

@Yunseong, Could you make a reproduce without DEPT and share
full log for futher analysis?

Thanks.

[0] https://lore.kernel.org/all/dglxbwe2i5ubofefdxwo5jvyhdfjov37z5jzc5guedhe4dl6ia@pmkjkec3isb4/

--
Sincerely,
Yeoreum Yun