lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aW7Vy_RpxseBC4UQ@ndev>
Date: Tue, 20 Jan 2026 09:09:31 +0800
From: Jinchao Wang <wangjinchao600@...il.com>
To: Viacheslav Dubeyko <Slava.Dubeyko@....com>
Cc: "glaubitz@...sik.fu-berlin.de" <glaubitz@...sik.fu-berlin.de>,
	"frank.li@...o.com" <frank.li@...o.com>,
	"slava@...eyko.com" <slava@...eyko.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	"syzbot+1e3ff4b07c16ca0f6fe2@...kaller.appspotmail.com" <syzbot+1e3ff4b07c16ca0f6fe2@...kaller.appspotmail.com>
Subject: Re: [RFC PATCH] fs/hfs: fix ABBA deadlock in hfs_mdb_commit

On Mon, Jan 19, 2026 at 06:09:16PM +0000, Viacheslav Dubeyko wrote:
> On Fri, 2026-01-16 at 16:10 +0800, Jinchao Wang wrote:
> > On Thu, Jan 15, 2026 at 09:12:49PM +0000, Viacheslav Dubeyko wrote:
> > > On Thu, 2026-01-15 at 11:34 +0800, Jinchao Wang wrote:
> > > > On Wed, Jan 14, 2026 at 07:29:45PM +0000, Viacheslav Dubeyko wrote:
> > > > > On Wed, 2026-01-14 at 11:03 +0800, Jinchao Wang wrote:
> > > > > > On Tue, Jan 13, 2026 at 08:52:45PM +0000, Viacheslav Dubeyko wrote:
> > > > > > > On Tue, 2026-01-13 at 16:19 +0800, Jinchao Wang wrote:
> > > > > > > > syzbot reported a hung task in hfs_mdb_commit where a deadlock occurs
> > > > > > > > between the MDB buffer lock and the folio lock.
> > > > > > > > 
> > > > > > > > The deadlock happens because hfs_mdb_commit() holds the mdb_bh
> > > > > > > > lock while calling sb_bread(), which attempts to acquire the lock
> > > > > > > > on the same folio.
> > > > > > > 
> > > > > > > I don't quite to follow to your logic. We have only one sb_bread() [1] in
> > > > > > > hfs_mdb_commit(). This read is trying to extract the volume bitmap. How is it
> > > > > > > possible that superblock and volume bitmap is located at the same folio? Are you
> > > > > > > sure? Which size of the folio do you imply here?
> > > > > > > 
> > > > > > > Also, it your logic is correct, then we never could be able to mount/unmount or
> > > > > > > run any operations on HFS volumes because of likewise deadlock. However, I can
> > > > > > > run xfstests on HFS volume.
> > > > > > > 
> > > > > > > [1] https://elixir.bootlin.com/linux/v6.19-rc5/source/fs/hfs/mdb.c#L324      
> > > > > > 
> > > > > > Hi Viacheslav,
> > > > > > 
> > > > > > After reviewing your feedback, I realized that my previous RFC was not in
> > > > > > the correct format. It was not intended to be a final, merge-ready patch,
> > > > > > but rather a record of the analysis and trial fixes conducted so far.
> > > > > > I apologize for the confusion caused by my previous email.
> > > > > > 
> > > > > > The details are reorganized as follows:
> > > > > > 
> > > > > > - Observation
> > > > > > - Analysis
> > > > > > - Verification
> > > > > > - Conclusion
> > > > > > 
> > > > > > Observation
> > > > > > ============
> > > > > > 
> > > > > > Syzbot report: https://syzkaller.appspot.com/bug?extid=1e3ff4b07c16ca0f6fe2      
> > > > > > 
> > > > > > For this version:
> > > > > > > time             |  kernel    | Commit       | Syzkaller |
> > > > > > > 2025/12/20 17:03 | linux-next | cc3aa43b44bd | d6526ea3  |
> > > > > > 
> > > > > > Crash log: https://syzkaller.appspot.com/text?tag=CrashLog&x=12909b1a580000      
> > > > > > 
> > > > > > The report indicates hung tasks within the hfs context.
> > > > > > 
> > > > > > Analysis
> > > > > > ========
> > > > > > In the crash log, the lockdep information requires adjustment based on the call stack.
> > > > > > After adjustment, a deadlock is identified:
> > > > > > 
> > > > > > task syz.1.1902:8009
> > > > > > - held &disk->open_mutex
> > > > > > - held foio lock
> > > > > > - wait lock_buffer(bh)
> > > > > > Partial call trace:
> > > > > > ->blkdev_writepages()
> > > > > >         ->writeback_iter()
> > > > > >                 ->writeback_get_folio()
> > > > > >                         ->folio_lock(folio)
> > > > > >         ->block_write_full_folio()
> > > > > >                 __block_write_full_folio()
> > > > > >                         ->lock_buffer(bh)
> > > > > > 
> > > > > > task syz.0.1904:8010
> > > > > > - held &type->s_umount_key#66 down_read
> > > > > > - held lock_buffer(HFS_SB(sb)->mdb_bh);
> > > > > > - wait folio
> > > > > > Partial call trace:
> > > > > > hfs_mdb_commit
> > > > > >         ->lock_buffer(HFS_SB(sb)->mdb_bh);
> > > > > >         ->bh = sb_bread(sb, block);
> > > > > >                 ...->folio_lock(folio)
> > > > > > 
> > > > > > 
> > > > > > Other hung tasks are secondary effects of this deadlock. The issue
> > > > > > is reproducible in my local environment usuing the syz-reproducer.
> > > > > > 
> > > > > > Verification
> > > > > > ==============
> > > > > > 
> > > > > > Two patches are verified against the syz-reproducer.
> > > > > > Neither reproduce the deadlock.
> > > > > > 
> > > > > > Option 1: Removing `un/lock_buffer(HFS_SB(sb)->mdb_bh)`
> > > > > > ------------------------------------------------------
> > > > > > 
> > > > > > diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
> > > > > > index 53f3fae60217..c641adb94e6f 100644
> > > > > > --- a/fs/hfs/mdb.c
> > > > > > +++ b/fs/hfs/mdb.c
> > > > > > @@ -268,7 +268,6 @@ void hfs_mdb_commit(struct super_block *sb)
> > > > > >         if (sb_rdonly(sb))
> > > > > >                 return;
> > > > > > 
> > > > > > -       lock_buffer(HFS_SB(sb)->mdb_bh);
> > > > > >         if (test_and_clear_bit(HFS_FLG_MDB_DIRTY, &HFS_SB(sb)->flags)) {
> > > > > >                 /* These parameters may have been modified, so write them back */
> > > > > >                 mdb->drLsMod = hfs_mtime();
> > > > > > @@ -340,7 +339,6 @@ void hfs_mdb_commit(struct super_block *sb)
> > > > > >                         size -= len;
> > > > > >                 }
> > > > > >         }
> > > > > > -       unlock_buffer(HFS_SB(sb)->mdb_bh);
> > > > > >  }
> > > > > > 
> > > > > > 
> > > > > > Options 2: Moving `unlock_buffer(HFS_SB(sb)->mdb_bh)`
> > > > > > --------------------------------------------------------
> > > > > > 
> > > > > > diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
> > > > > > index 53f3fae60217..ec534c630c7e 100644
> > > > > > --- a/fs/hfs/mdb.c
> > > > > > +++ b/fs/hfs/mdb.c
> > > > > > @@ -309,6 +309,7 @@ void hfs_mdb_commit(struct super_block *sb)
> > > > > >                 sync_dirty_buffer(HFS_SB(sb)->alt_mdb_bh);
> > > > > >         }
> > > > > >  
> > > > > > +       unlock_buffer(HFS_SB(sb)->mdb_bh);
> > > > > >         if (test_and_clear_bit(HFS_FLG_BITMAP_DIRTY, &HFS_SB(sb)->flags)) {
> > > > > >                 struct buffer_head *bh;
> > > > > >                 sector_t block;
> > > > > > @@ -340,7 +341,6 @@ void hfs_mdb_commit(struct super_block *sb)
> > > > > >                         size -= len;
> > > > > >                 }
> > > > > >         }
> > > > > > -       unlock_buffer(HFS_SB(sb)->mdb_bh);
> > > > > >  }
> > > > > > 
> > > > > > Conclusion
> > > > > > ==========
> > > > > > 
> > > > > > The analysis and verification confirms that the hung tasks are caused by
> > > > > > the deadlock between `lock_buffer(HFS_SB(sb)->mdb_bh)` and `sb_bread(sb, block)`.
> > > > > 
> > > > > First of all, we need to answer this question: How is it
> > > > > possible that superblock and volume bitmap is located at the same folio or
> > > > > logical block? In normal case, the superblock and volume bitmap should not be
> > > > > located in the same logical block. It sounds to me that you have corrupted
> > > > > volume and this is why this logic [1] finally overlap with superblock location:
> > > > > 
> > > > > block = be16_to_cpu(HFS_SB(sb)->mdb->drVBMSt) + HFS_SB(sb)->part_start;
> > > > > off = (block << HFS_SECTOR_SIZE_BITS) & (sb->s_blocksize - 1);
> > > > > block >>= sb->s_blocksize_bits - HFS_SECTOR_SIZE_BITS;
> > > > > 
> > > > > I assume that superblock is corrupted and the mdb->drVBMSt [2] has incorrect
> > > > > metadata. As a result, we have this deadlock situation. The fix should be not
> > > > > here but we need to add some sanity check of mdb->drVBMSt somewhere in
> > > > > hfs_fill_super() workflow.
> > > > > 
> > > > > Could you please check my vision?
> > > > > 
> > > > > Thanks,
> > > > > Slava.
> > > > > 
> > > > > [1] https://elixir.bootlin.com/linux/v6.19-rc5/source/fs/hfs/mdb.c#L318    
> > > > > [2]
> > > > > https://elixir.bootlin.com/linux/v6.19-rc5/source/include/linux/hfs_common.h#L196    
> > > > 
> > > > Hi Slava,
> > > > 
> > > > I have traced the values during the hang. Here are the values observed:
> > > > 
> > > > - MDB: blocknr=2
> > > > - Volume Bitmap (drVBMSt): 3
> > > > - s_blocksize: 512 bytes
> > > > 
> > > > This confirms a circular dependency between the folio lock and
> > > > the buffer lock. The writeback thread holds the 4KB folio lock and 
> > > > waits for the MDB buffer lock (block 2). Simultaneously, the HFS sync 
> > > > thread holds the MDB buffer lock and waits for the same folio lock 
> > > > to read the bitmap (block 3).
> > > > 
> > > > 
> > > > Since block 2 and block 3 share the same folio, this locking 
> > > > inversion occurs. I would appreciate your thoughts on whether 
> > > > hfs_fill_super() should validate drVBMSt to ensure the bitmap 
> > > > does not reside in the same folio as the MDB.
> > > 
> > > 
> > > As far as I can see, I can run xfstest on HFS volume (for example, generic/001
> > > has been finished successfully):
> > > 
> > > sudo ./check -g auto -E ./my_exclude.txt 
> > > FSTYP         -- hfs
> > > PLATFORM      -- Linux/x86_64 hfsplus-testing-0001 6.19.0-rc1+ #56 SMP
> > > PREEMPT_DYNAMIC Thu Jan 15 12:55:22 PST 2026
> > > MKFS_OPTIONS  -- /dev/loop51
> > > MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch
> > > 
> > > generic/001 36s ...  36s
> > > 
> > > 2026-01-15T13:00:07.589868-08:00 hfsplus-testing-0001 kernel: run fstests
> > > generic/001 at 2026-01-15 13:00:07
> > > 2026-01-15T13:00:07.661605-08:00 hfsplus-testing-0001 systemd[1]: Started
> > > fstests-generic-001.scope - /usr/bin/bash -c "test -w /proc/self/oom_score_adj
> > > && echo 250 > /proc/self/oom_score_adj; exec ./tests/generic/001".
> > > 2026-01-15T13:00:13.355795-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():296 HFS_SB(sb)->mdb_bh buffer has been locked
> > > 2026-01-15T13:00:13.355809-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():348 drVBMSt 3, part_start 0, off 0, block 3, size 8167
> > > 2026-01-15T13:00:13.355810-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.355810-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.355811-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.355812-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.355812-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.355812-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.355813-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.355813-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.355813-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.355814-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.355814-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.355815-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.355815-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.355815-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.355816-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.355816-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.355816-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.355816-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.355817-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.355818-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.355818-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.355818-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.355819-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.355819-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.355819-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.355819-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.355820-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.355820-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.355821-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.355821-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.355821-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.355822-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.355822-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():383 HFS_SB(sb)->mdb_bh buffer has been unlocked
> > > 2026-01-15T13:00:13.681527-08:00 hfsplus-testing-0001 systemd[1]: fstests-
> > > generic-001.scope: Deactivated successfully.
> > > 2026-01-15T13:00:13.681597-08:00 hfsplus-testing-0001 systemd[1]: fstests-
> > > generic-001.scope: Consumed 5.928s CPU time.
> > > 2026-01-15T13:00:13.714928-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():296 HFS_SB(sb)->mdb_bh buffer has been locked
> > > 2026-01-15T13:00:13.714942-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():348 drVBMSt 3, part_start 0, off 0, block 3, size 8167
> > > 2026-01-15T13:00:13.714943-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.714944-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.714944-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.714944-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.714945-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.714945-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.714946-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.714946-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.714947-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.714947-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.714947-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.714948-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.714948-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.714948-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.714949-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.714949-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.714950-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.714950-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.714950-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.714951-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.714951-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.714952-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.714952-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.714952-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.714953-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.714953-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.714953-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.714954-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.714954-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.714955-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.714955-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():356 start read volume bitmap block
> > > 2026-01-15T13:00:13.714955-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():370 volume bitmap block has been read and copied
> > > 2026-01-15T13:00:13.714956-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():383 HFS_SB(sb)->mdb_bh buffer has been unlocked
> > > 2026-01-15T13:00:13.716742-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():296 HFS_SB(sb)->mdb_bh buffer has been locked
> > > 2026-01-15T13:00:13.716754-08:00 hfsplus-testing-0001 kernel: hfs:
> > > hfs_mdb_commit():383 HFS_SB(sb)->mdb_bh buffer has been unlocked
> > > 2026-01-15T13:00:13.722184-08:00 hfsplus-testing-0001 systemd[1]: mnt-
> > > test.mount: Deactivated successfully.
> > > 
> > > And I don't see any issues with locking into the added debug output. I don't see
> > > the reproduction of reported deadlock. And the logic of hfs_mdb_commit() correct
> > > enough.
> > > 
> > > The main question is: how blkdev_writepages() can collide with hfs_mdb_commit()?
> > > I assume that blkdev_writepages() is trying to flush the user data. So, what is
> > > the problem here? Is it allocation issue? Does it mean that some file was not
> > > properly allocated? Or does it mean that superblock commit somehow collided with
> > > user data flush? But how does it possible? Which particular workload could have
> > > such issue?
> > > 
> > > Currently, your analysis doesn't show what problem is and how it is happened. 
> > > 
> > > Thanks,
> > > Slava.
> > 
> > Hi Slava,
> > 
> > Thank you very much for your feedback and for taking the time to 
> > review this. I apologize if my previous analysis was not clear 
> > enough. As I am relatively new to this area, I truly appreciate 
> > your patience.
> > 
> > After further tracing, I would like to share more details on how the 
> > collision between blkdev_writepages() and hfs_mdb_commit() occurs. 
> > It appears to be a timing-specific race condition.
> > 
> > 1. Physical Overlap (The "How"):
> > In my environment, the HFS block size is 512B and the MDB is located 
> > at block 2 (offset 1024). Since 1024 < 4096, the MDB resides 
> > within the block device's first folio (index 0). 
> > Consequently, both the filesystem layer (via mdb_bh) and the block 
> > layer (via bdev mapping) operate on the exact same folio at index 0.
> > 
> > 2. The Race Window (The "Why"):
> > The collision is triggered by the global nature of ksys_sync(). In 
> > a system with multiple mounted devices, there is a significant time 
> > gap between Stage 1 (iterate_supers) and Stage 2 (sync_bdevs). This 
> > window allows a concurrent task to dirty the MDB folio after one 
> > sync task has already passed its FS-sync stage.
> > 
> > 3. Proposed Reproduction Timeline:
> > - Task A: Starts ksys_sync() and finishes iterate_supers() 
> >   for the HFS device. It then moves on to sync other devices.
> > - Task B: Creates a new file on HFS, then starts its 
> >   own ksys_sync().
> > - Task B: Enters hfs_mdb_commit(), calls lock_buffer(mdb_bh) and 
> >   mark_buffer_dirty(mdb_bh). This makes folio 0 dirty.
> > - Task A: Finally reaches sync_bdevs() for the HFS device. It sees 
> >   folio 0 is dirty, calls folio_lock(folio), and then attempts 
> >   to lock_buffer(mdb_bh) for I/O.
> > - Task A: Blocks waiting for mdb_bh lock (held by Task B).
> > - Task B: Continues hfs_mdb_commit() -> sb_bread(), which attempts 
> >   to lock folio 0 (held by Task A).
> > 
> > This results in an AB-BA deadlock between the Folio Lock and the 
> > Buffer Lock.
> > 
> > I hope this clarifies why the collision is possible even though 
> > hfs_mdb_commit() seems correct in isolation. It is the concurrent 
> > interleaving of FS-level and BDEV-level syncs that triggers the 
> > violation of the Folio -> Buffer locking order.
> > 
> > I would be very grateful for your thoughts on this updated analysis.
> > 
> > 
> 
> Firs of all, I've tried to check the syzbot report that you are mentioning in
> the patch. And I was confused because it was report for FAT. So, I don't see the
> way how I can reproduce the issue on my side.
> 
> Secondly, I need to see the real call trace of the issue. This discussion
> doesn't make sense without the reproduction path and the call trace(s) of the
> issue.
> 
> Thanks,
> Slava.
There are many crash in the syz report page, please follow the specified time and version.

Syzbot report: https://syzkaller.appspot.com/bug?extid=1e3ff4b07c16ca0f6fe2

For this version:
| time             |  kernel    | Commit       | Syzkaller |
| 2025/12/20 17:03 | linux-next | cc3aa43b44bd | d6526ea3  |

The full call trace can be found in the crash log of "2025/12/20 17:03", which url is:

Crash log: https://syzkaller.appspot.com/text?tag=CrashLog&x=12909b1a580000

-- 
Thanks,
Jinchao

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ