linux-kernel - Re: [RFC PATCH] fs/hfs: fix ABBA deadlock in hfs_mdb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aWhgNujuXujxSg3E@ndev>
Date: Thu, 15 Jan 2026 11:34:29 +0800
From: Jinchao Wang <wangjinchao600@...il.com>
To: Viacheslav Dubeyko <Slava.Dubeyko@....com>
Cc: "glaubitz@...sik.fu-berlin.de" <glaubitz@...sik.fu-berlin.de>,
	"frank.li@...o.com" <frank.li@...o.com>,
	"slava@...eyko.com" <slava@...eyko.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	"syzbot+1e3ff4b07c16ca0f6fe2@...kaller.appspotmail.com" <syzbot+1e3ff4b07c16ca0f6fe2@...kaller.appspotmail.com>
Subject: Re: [RFC PATCH] fs/hfs: fix ABBA deadlock in hfs_mdb_commit

On Wed, Jan 14, 2026 at 07:29:45PM +0000, Viacheslav Dubeyko wrote:
> On Wed, 2026-01-14 at 11:03 +0800, Jinchao Wang wrote:
> > On Tue, Jan 13, 2026 at 08:52:45PM +0000, Viacheslav Dubeyko wrote:
> > > On Tue, 2026-01-13 at 16:19 +0800, Jinchao Wang wrote:
> > > > syzbot reported a hung task in hfs_mdb_commit where a deadlock occurs
> > > > between the MDB buffer lock and the folio lock.
> > > > 
> > > > The deadlock happens because hfs_mdb_commit() holds the mdb_bh
> > > > lock while calling sb_bread(), which attempts to acquire the lock
> > > > on the same folio.
> > > 
> > > I don't quite to follow to your logic. We have only one sb_bread() [1] in
> > > hfs_mdb_commit(). This read is trying to extract the volume bitmap. How is it
> > > possible that superblock and volume bitmap is located at the same folio? Are you
> > > sure? Which size of the folio do you imply here?
> > > 
> > > Also, it your logic is correct, then we never could be able to mount/unmount or
> > > run any operations on HFS volumes because of likewise deadlock. However, I can
> > > run xfstests on HFS volume.
> > > 
> > > [1] https://elixir.bootlin.com/linux/v6.19-rc5/source/fs/hfs/mdb.c#L324  
> > 
> > Hi Viacheslav,
> > 
> > After reviewing your feedback, I realized that my previous RFC was not in
> > the correct format. It was not intended to be a final, merge-ready patch,
> > but rather a record of the analysis and trial fixes conducted so far.
> > I apologize for the confusion caused by my previous email.
> > 
> > The details are reorganized as follows:
> > 
> > - Observation
> > - Analysis
> > - Verification
> > - Conclusion
> > 
> > Observation
> > ============
> > 
> > Syzbot report: https://syzkaller.appspot.com/bug?extid=1e3ff4b07c16ca0f6fe2  
> > 
> > For this version:
> > > time             |  kernel    | Commit       | Syzkaller |
> > > 2025/12/20 17:03 | linux-next | cc3aa43b44bd | d6526ea3  |
> > 
> > Crash log: https://syzkaller.appspot.com/text?tag=CrashLog&x=12909b1a580000  
> > 
> > The report indicates hung tasks within the hfs context.
> > 
> > Analysis
> > ========
> > In the crash log, the lockdep information requires adjustment based on the call stack.
> > After adjustment, a deadlock is identified:
> > 
> > task syz.1.1902:8009
> > - held &disk->open_mutex
> > - held foio lock
> > - wait lock_buffer(bh)
> > Partial call trace:
> > ->blkdev_writepages()
> >         ->writeback_iter()
> >                 ->writeback_get_folio()
> >                         ->folio_lock(folio)
> >         ->block_write_full_folio()
> >                 __block_write_full_folio()
> >                         ->lock_buffer(bh)
> > 
> > task syz.0.1904:8010
> > - held &type->s_umount_key#66 down_read
> > - held lock_buffer(HFS_SB(sb)->mdb_bh);
> > - wait folio
> > Partial call trace:
> > hfs_mdb_commit
> >         ->lock_buffer(HFS_SB(sb)->mdb_bh);
> >         ->bh = sb_bread(sb, block);
> >                 ...->folio_lock(folio)
> > 
> > 
> > Other hung tasks are secondary effects of this deadlock. The issue
> > is reproducible in my local environment usuing the syz-reproducer.
> > 
> > Verification
> > ==============
> > 
> > Two patches are verified against the syz-reproducer.
> > Neither reproduce the deadlock.
> > 
> > Option 1: Removing `un/lock_buffer(HFS_SB(sb)->mdb_bh)`
> > ------------------------------------------------------
> > 
> > diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
> > index 53f3fae60217..c641adb94e6f 100644
> > --- a/fs/hfs/mdb.c
> > +++ b/fs/hfs/mdb.c
> > @@ -268,7 +268,6 @@ void hfs_mdb_commit(struct super_block *sb)
> >         if (sb_rdonly(sb))
> >                 return;
> > 
> > -       lock_buffer(HFS_SB(sb)->mdb_bh);
> >         if (test_and_clear_bit(HFS_FLG_MDB_DIRTY, &HFS_SB(sb)->flags)) {
> >                 /* These parameters may have been modified, so write them back */
> >                 mdb->drLsMod = hfs_mtime();
> > @@ -340,7 +339,6 @@ void hfs_mdb_commit(struct super_block *sb)
> >                         size -= len;
> >                 }
> >         }
> > -       unlock_buffer(HFS_SB(sb)->mdb_bh);
> >  }
> > 
> > 
> > Options 2: Moving `unlock_buffer(HFS_SB(sb)->mdb_bh)`
> > --------------------------------------------------------
> > 
> > diff --git a/fs/hfs/mdb.c b/fs/hfs/mdb.c
> > index 53f3fae60217..ec534c630c7e 100644
> > --- a/fs/hfs/mdb.c
> > +++ b/fs/hfs/mdb.c
> > @@ -309,6 +309,7 @@ void hfs_mdb_commit(struct super_block *sb)
> >                 sync_dirty_buffer(HFS_SB(sb)->alt_mdb_bh);
> >         }
> >  
> > +       unlock_buffer(HFS_SB(sb)->mdb_bh);
> >         if (test_and_clear_bit(HFS_FLG_BITMAP_DIRTY, &HFS_SB(sb)->flags)) {
> >                 struct buffer_head *bh;
> >                 sector_t block;
> > @@ -340,7 +341,6 @@ void hfs_mdb_commit(struct super_block *sb)
> >                         size -= len;
> >                 }
> >         }
> > -       unlock_buffer(HFS_SB(sb)->mdb_bh);
> >  }
> > 
> > Conclusion
> > ==========
> > 
> > The analysis and verification confirms that the hung tasks are caused by
> > the deadlock between `lock_buffer(HFS_SB(sb)->mdb_bh)` and `sb_bread(sb, block)`.
> 
> First of all, we need to answer this question: How is it
> possible that superblock and volume bitmap is located at the same folio or
> logical block? In normal case, the superblock and volume bitmap should not be
> located in the same logical block. It sounds to me that you have corrupted
> volume and this is why this logic [1] finally overlap with superblock location:
> 
> block = be16_to_cpu(HFS_SB(sb)->mdb->drVBMSt) + HFS_SB(sb)->part_start;
> off = (block << HFS_SECTOR_SIZE_BITS) & (sb->s_blocksize - 1);
> block >>= sb->s_blocksize_bits - HFS_SECTOR_SIZE_BITS;
> 
> I assume that superblock is corrupted and the mdb->drVBMSt [2] has incorrect
> metadata. As a result, we have this deadlock situation. The fix should be not
> here but we need to add some sanity check of mdb->drVBMSt somewhere in
> hfs_fill_super() workflow.
> 
> Could you please check my vision?
> 
> Thanks,
> Slava.
> 
> [1] https://elixir.bootlin.com/linux/v6.19-rc5/source/fs/hfs/mdb.c#L318
> [2]
> https://elixir.bootlin.com/linux/v6.19-rc5/source/include/linux/hfs_common.h#L196

Hi Slava,

I have traced the values during the hang. Here are the values observed:

- MDB: blocknr=2
- Volume Bitmap (drVBMSt): 3
- s_blocksize: 512 bytes

This confirms a circular dependency between the folio lock and
the buffer lock. The writeback thread holds the 4KB folio lock and 
waits for the MDB buffer lock (block 2). Simultaneously, the HFS sync 
thread holds the MDB buffer lock and waits for the same folio lock 
to read the bitmap (block 3).


Since block 2 and block 3 share the same folio, this locking 
inversion occurs. I would appreciate your thoughts on whether 
hfs_fill_super() should validate drVBMSt to ensure the bitmap 
does not reside in the same folio as the MDB.