[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aXA9j9oQLHAHPP46@ndev>
Date: Wed, 21 Jan 2026 10:44:31 +0800
From: Jinchao Wang <wangjinchao600@...il.com>
To: Viacheslav Dubeyko <Slava.Dubeyko@....com>
Cc: "glaubitz@...sik.fu-berlin.de" <glaubitz@...sik.fu-berlin.de>,
"frank.li@...o.com" <frank.li@...o.com>,
"slava@...eyko.com" <slava@...eyko.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"syzbot+1e3ff4b07c16ca0f6fe2@...kaller.appspotmail.com" <syzbot+1e3ff4b07c16ca0f6fe2@...kaller.appspotmail.com>
Subject: Re: [RFC PATCH] fs/hfs: fix ABBA deadlock in hfs_mdb_commit
On Tue, Jan 20, 2026 at 08:51:06PM +0000, Viacheslav Dubeyko wrote:
> On Tue, 2026-01-20 at 09:09 +0800, Jinchao Wang wrote:
> >
>
> <skipped>
>
> > >
> > > Firs of all, I've tried to check the syzbot report that you are mentioning in
> > > the patch. And I was confused because it was report for FAT. So, I don't see the
> > > way how I can reproduce the issue on my side.
> > >
> > > Secondly, I need to see the real call trace of the issue. This discussion
> > > doesn't make sense without the reproduction path and the call trace(s) of the
> > > issue.
> > >
> > > Thanks,
> > > Slava.
> > There are many crash in the syz report page, please follow the specified time and version.
> >
> > Syzbot report: https://syzkaller.appspot.com/bug?extid=1e3ff4b07c16ca0f6fe2
> >
> > For this version:
> > > time | kernel | Commit | Syzkaller |
> > > 2025/12/20 17:03 | linux-next | cc3aa43b44bd | d6526ea3 |
> >
> > The full call trace can be found in the crash log of "2025/12/20 17:03", which url is:
> >
> > Crash log: https://syzkaller.appspot.com/text?tag=CrashLog&x=12909b1a580000
>
> This call trace is dedicated to flushing inode's dirty pages in page cache, as
> far as I can see:
>
> [ 504.401993][ T31] INFO: task kworker/u8:1:13 blocked for more than 143
> seconds.
> [ 504.434587][ T31] Not tainted syzkaller #0
> [ 504.441437][ T31] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 504.451145][ T31] task:kworker/u8:1 state:D stack:22792 pid:13
> tgid:13 ppid:2 task_flags:0x4208060 flags:0x00080000
> [ 504.463591][ T31] Workqueue: writeback wb_workfn (flush-7:4)
> [ 504.471997][ T31] Call Trace:
> [ 504.475502][ T31] <TASK>
> ...
> [ 504.805695][ T31] </TASK>
>
> And this call trace is dedicated to superblock commit:
>
> [ 505.186758][ T31] INFO: task kworker/1:4:5971 blocked for more than 144
> seconds.
> [ 505.194752][ T8014] Bluetooth: hci37: command tx timeout
> [ 505.210267][ T31] Not tainted syzkaller #0
> [ 505.215260][ T31] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 505.273687][ T31] task:kworker/1:4 state:D stack:24152 pid:5971
> tgid:5971 ppid:2 task_flags:0x4208060 flags:0x00080000
> [ 505.287569][ T31] Workqueue: events_long flush_mdb
> [ 505.293762][ T31] Call Trace:
> [ 505.297607][ T31] <TASK>
> ...
> [ 505.570372][ T31] </TASK>
>
> I don't see any relation between folios in inode's page cache and HFS_SB(sb)-
> >mdb_bh because they cannot share the same folio.
What you pasted are not the right tasks. Please see this analysis which I sent before
and focus on the task id 8009 and 8010.
Analysis
========
In the crash log, the lockdep information requires adjustment based on the call stack.
After adjustment, a deadlock is identified:
** task syz.1.1902:8009 **
- held &disk->open_mutex
- held foio lock
- wait lock_buffer(bh)
Partial call trace:
->blkdev_writepages()
->writeback_iter()
->writeback_get_folio()
->folio_lock(folio)
->block_write_full_folio()
__block_write_full_folio()
->lock_buffer(bh)
task syz.0.1904:8010
- held &type->s_umount_key#66 down_read
- held lock_buffer(HFS_SB(sb)->mdb_bh);
- wait folio
Partial call trace:
hfs_mdb_commit
->lock_buffer(HFS_SB(sb)->mdb_bh);
->bh = sb_bread(sb, block);
...->folio_lock(folio)
Other hung tasks are secondary effects of this deadlock. The issue
is reproducible in my local environment usuing the syz-reproducer.
> I still don't see from your
> explanation how the issue could happen. I don't see how lock_buffer(HFS_SB(sb)-
> >mdb_bh) can be responsible for the issue.
> Oppositely, if we follow to your
> logic, then we never can be able to mount any HFS volume. But xfstests works for
> HFS file systems (of course, multiple tests fail) and I cannot see the deadlock
> for common situation. So, you need to explain which particular use-case can
> reproduce the issue and what is mechanism of deadlock happening.
>
Please follow what I sent and do the reproduce.
Have you ever try the specified time and version in the syz report page?
| time | kernel | Commit | Syzkaller |
| 2025/12/20 17:03 | linux-next | cc3aa43b44bd | d6526ea3 |
--
Thanks,
Jinchao
Powered by blists - more mailing lists