linux-kernel - RE: [RFC PATCH] fs/hfs: fix ABBA deadlock in hfs_mdb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3a5b428754b6e006025c462f37e610b5a5e361a5.camel@ibm.com>
Date: Tue, 20 Jan 2026 20:51:06 +0000
From: Viacheslav Dubeyko <Slava.Dubeyko@....com>
To: "wangjinchao600@...il.com" <wangjinchao600@...il.com>
CC: "glaubitz@...sik.fu-berlin.de" <glaubitz@...sik.fu-berlin.de>,
        "frank.li@...o.com" <frank.li@...o.com>,
        "slava@...eyko.com"
	<slava@...eyko.com>,
        "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>,
        "linux-fsdevel@...r.kernel.org"
	<linux-fsdevel@...r.kernel.org>,
        "syzbot+1e3ff4b07c16ca0f6fe2@...kaller.appspotmail.com"
	<syzbot+1e3ff4b07c16ca0f6fe2@...kaller.appspotmail.com>
Subject: RE: [RFC PATCH] fs/hfs: fix ABBA deadlock in hfs_mdb_commit

On Tue, 2026-01-20 at 09:09 +0800, Jinchao Wang wrote:
> 

<skipped>

> > 
> > Firs of all, I've tried to check the syzbot report that you are mentioning in
> > the patch. And I was confused because it was report for FAT. So, I don't see the
> > way how I can reproduce the issue on my side.
> > 
> > Secondly, I need to see the real call trace of the issue. This discussion
> > doesn't make sense without the reproduction path and the call trace(s) of the
> > issue.
> > 
> > Thanks,
> > Slava.
> There are many crash in the syz report page, please follow the specified time and version.
> 
> Syzbot report: https://syzkaller.appspot.com/bug?extid=1e3ff4b07c16ca0f6fe2  
> 
> For this version:
> > time             |  kernel    | Commit       | Syzkaller |
> > 2025/12/20 17:03 | linux-next | cc3aa43b44bd | d6526ea3  |
> 
> The full call trace can be found in the crash log of "2025/12/20 17:03", which url is:
> 
> Crash log: https://syzkaller.appspot.com/text?tag=CrashLog&x=12909b1a580000  

This call trace is dedicated to flushing inode's dirty pages in page cache, as
far as I can see:

[  504.401993][   T31] INFO: task kworker/u8:1:13 blocked for more than 143
seconds.
[  504.434587][   T31]       Not tainted syzkaller #0
[  504.441437][   T31] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  504.451145][   T31] task:kworker/u8:1    state:D stack:22792 pid:13   
tgid:13    ppid:2      task_flags:0x4208060 flags:0x00080000
[  504.463591][   T31] Workqueue: writeback wb_workfn (flush-7:4)
[  504.471997][   T31] Call Trace:
[  504.475502][   T31]  <TASK>
[  504.479684][   T31]  __schedule+0x150e/0x5070
[  504.484307][   T31]  ? __pfx___schedule+0x10/0x10
[  504.491526][   T31]  ? __blk_flush_plug+0x3fc/0x4b0
[  504.496683][   T31]  ? schedule+0x91/0x360
[  504.501085][   T31]  schedule+0x165/0x360
[  504.505366][   T31]  io_schedule+0x80/0xd0
[  504.510102][   T31]  folio_wait_bit_common+0x6b0/0xb80
[  504.532721][   T31]  ? __pfx_folio_wait_bit_common+0x10/0x10
[  504.538760][   T31]  ? __pfx_wake_page_function+0x10/0x10
[  504.544344][   T31]  ? _raw_spin_unlock_irqrestore+0xad/0x110
[  504.551446][   T31]  ? writeback_iter+0x853/0x1280
[  504.556492][   T31]  writeback_iter+0x8d8/0x1280
[  504.564484][   T31]  blkdev_writepages+0xb7/0x170
[  504.569517][   T31]  ? __pfx_blkdev_writepages+0x10/0x10
[  504.575043][   T31]  ? __pfx_blkdev_writepages+0x10/0x10
[  504.580705][   T31]  do_writepages+0x32e/0x550
[  504.585344][   T31]  ? reacquire_held_locks+0x121/0x1c0
[  504.591296][   T31]  ? writeback_sb_inodes+0x3bd/0x1870
[  504.596806][   T31]  __writeback_single_inode+0x133/0x1240
[  504.603290][   T31]  ? do_raw_spin_unlock+0x122/0x240
[  504.608620][   T31]  writeback_sb_inodes+0x93a/0x1870
[  504.613878][   T31]  ? __pfx_writeback_sb_inodes+0x10/0x10
[  504.637194][   T31]  ? __pfx_down_read_trylock+0x10/0x10
[  504.642838][   T31]  ? __pfx_move_expired_inodes+0x10/0x10
[  504.648717][   T31]  __writeback_inodes_wb+0x111/0x240
[  504.654048][   T31]  wb_writeback+0x43f/0xaa0
[  504.658709][   T31]  ? queue_io+0x281/0x450
[  504.663179][   T31]  ? __pfx_wb_writeback+0x10/0x10
[  504.668641][   T31]  wb_workfn+0x8ee/0xed0
[  504.673021][   T31]  ? __pfx_wb_workfn+0x10/0x10
[  504.677989][   T31]  ? _raw_spin_unlock_irqrestore+0xad/0x110
[  504.683916][   T31]  ? preempt_schedule+0xae/0xc0
[  504.688852][   T31]  ? preempt_schedule_common+0x83/0xd0
[  504.694389][   T31]  ? process_one_work+0x868/0x15a0
[  504.699698][   T31]  process_one_work+0x93a/0x15a0
[  504.704752][   T31]  ? __pfx_process_one_work+0x10/0x10
[  504.717115][   T31]  ? assign_work+0x3c7/0x5b0
[  504.739767][   T31]  worker_thread+0x9b0/0xee0
[  504.744502][   T31]  kthread+0x711/0x8a0
[  504.748698][   T31]  ? __pfx_worker_thread+0x10/0x10
[  504.753855][   T31]  ? __pfx_kthread+0x10/0x10
[  504.758645][   T31]  ? _raw_spin_unlock_irq+0x23/0x50
[  504.763888][   T31]  ? lockdep_hardirqs_on+0x98/0x140
[  504.769331][   T31]  ? __pfx_kthread+0x10/0x10
[  504.773958][   T31]  ret_from_fork+0x599/0xb30
[  504.779253][   T31]  ? __pfx_ret_from_fork+0x10/0x10
[  504.784718][   T31]  ? __switch_to_asm+0x39/0x70
[  504.791355][   T31]  ? __switch_to_asm+0x33/0x70
[  504.796167][   T31]  ? __pfx_kthread+0x10/0x10
[  504.800882][   T31]  ret_from_fork_asm+0x1a/0x30
[  504.805695][   T31]  </TASK>

And this call trace is dedicated to superblock commit: 

[  505.186758][   T31] INFO: task kworker/1:4:5971 blocked for more than 144
seconds.
[  505.194752][ T8014] Bluetooth: hci37: command tx timeout
[  505.210267][   T31]       Not tainted syzkaller #0
[  505.215260][   T31] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  505.273687][   T31] task:kworker/1:4     state:D stack:24152 pid:5971 
tgid:5971  ppid:2      task_flags:0x4208060 flags:0x00080000
[  505.287569][   T31] Workqueue: events_long flush_mdb
[  505.293762][   T31] Call Trace:
[  505.297607][   T31]  <TASK>
[  505.307307][   T31]  __schedule+0x150e/0x5070
[  505.314414][   T31]  ? __pfx___schedule+0x10/0x10
[  505.325453][   T31]  ? _raw_spin_unlock_irqrestore+0xad/0x110
[  505.331535][   T31]  ? __pfx__raw_spin_unlock_irqrestore+0x10/0x10
[  505.354296][   T31]  ? preempt_schedule+0xae/0xc0
[  505.359482][   T31]  ? preempt_schedule+0xae/0xc0
[  505.364399][   T31]  ? __pfx___schedule+0x10/0x10
[  505.369493][   T31]  ? schedule+0x91/0x360
[  505.373819][   T31]  schedule+0x165/0x360
[  505.378340][   T31]  io_schedule+0x80/0xd0
[  505.382626][   T31]  bit_wait_io+0x11/0xd0
[  505.387219][   T31]  __wait_on_bit_lock+0xec/0x4f0
[  505.392201][   T31]  ? __pfx_bit_wait_io+0x10/0x10
[  505.397441][   T31]  ? __pfx_bit_wait_io+0x10/0x10
[  505.402435][   T31]  out_of_line_wait_on_bit_lock+0x123/0x170
[  505.408661][   T31]  ? __pfx___might_resched+0x10/0x10
[  505.414026][   T31]  ? __pfx_out_of_line_wait_on_bit_lock+0x10/0x10
[  505.420693][   T31]  ? __pfx_wake_bit_function+0x10/0x10
[  505.426212][   T31]  ? __lock_buffer+0xe/0x80
[  505.431646][   T31]  hfs_mdb_commit+0x115/0x12e0
[  505.451949][   T31]  ? do_raw_spin_unlock+0x122/0x240
[  505.457642][   T31]  ? _raw_spin_unlock+0x28/0x50
[  505.462552][   T31]  ? process_one_work+0x868/0x15a0
[  505.467897][   T31]  process_one_work+0x93a/0x15a0
[  505.472917][   T31]  ? __pfx_process_one_work+0x10/0x10
[  505.478463][   T31]  ? assign_work+0x3c7/0x5b0
[  505.483113][   T31]  worker_thread+0x9b0/0xee0
[  505.487894][   T31]  kthread+0x711/0x8a0
[  505.492015][   T31]  ? __pfx_worker_thread+0x10/0x10
[  505.497303][   T31]  ? __pfx_kthread+0x10/0x10
[  505.502429][   T31]  ? _raw_spin_unlock_irq+0x23/0x50
[  505.510913][   T31]  ? lockdep_hardirqs_on+0x98/0x140
[  505.516183][   T31]  ? __pfx_kthread+0x10/0x10
[  505.521290][   T31]  ret_from_fork+0x599/0xb30
[  505.525991][   T31]  ? __pfx_ret_from_fork+0x10/0x10
[  505.531301][   T31]  ? __switch_to_asm+0x39/0x70
[  505.535600][ T8874] chnl_net:caif_netlink_parms(): no params data found
[  505.536284][   T31]  ? __switch_to_asm+0x33/0x70
[  505.560487][   T31]  ? __pfx_kthread+0x10/0x10
[  505.565188][   T31]  ret_from_fork_asm+0x1a/0x30
[  505.570372][   T31]  </TASK>

I don't see any relation between folios in inode's page cache and HFS_SB(sb)-
>mdb_bh because they cannot share the same folio. I still don't see from your
explanation how the issue could happen. I don't see how lock_buffer(HFS_SB(sb)-
>mdb_bh) can be responsible for the issue. Oppositely, if we follow to your
logic, then we never can be able to mount any HFS volume. But xfstests works for
HFS file systems (of course, multiple tests fail) and I cannot see the deadlock
for common situation. So, you need to explain which particular use-case can
reproduce the issue and what is mechanism of deadlock happening.

Thanks,
Slava.