[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e2b42d21-97b5-4e25-a3b0-1d3ec23bd104@vivo.com>
Date: Mon, 1 Sep 2025 14:21:12 +0000
From: 杨晨志 <yang.chenzhi@...o.com>
To: Viacheslav Dubeyko <Slava.Dubeyko@....com>, "glaubitz@...sik.fu-berlin.de"
<glaubitz@...sik.fu-berlin.de>, "slava@...eyko.com" <slava@...eyko.com>,
李扬韬 <frank.li@...o.com>
CC: "syzbot+005d2a9ecd9fbf525f6a@...kaller.appspotmail.com"
<syzbot+005d2a9ecd9fbf525f6a@...kaller.appspotmail.com>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] hfsplus: fix missing hfs_bnode_get() in
__hfs_bnode_create
Thanks for reviewing.
The syzbot C reproducer creates five processes, each of which repeatedly
performs the following sequence of operations in a loop device: mount,
hard link creation, sync, unmount, and loop device reset[1]. In certain
cases, this can result in situations where, for example, Thread A is
executing sync while Thread B is concurrently performing a link operation.
> It's clear why link() could try to create a b-tree node. But it's not completely
> clear why sync() should try to create a b-tree node? Usually, sync() should try> to flush already existed inodes. Especially, if we imply any system
inodes that
> are created mostly during mount operation. How is it possible that sync() could> try to create the node?
```
if (!cnid && tree->cnid == 4)
dump_stack();
```
Add the code above to __hfs_bnode_create(). Call sync() immediately
after hfsplus is mounted successfully. The call trace shows that sync()
can create a bnode(create header node; id 0).
> Moreover, node_id 0 (root node) should be already
> created if we created the b-tree.
>
node_id 0 is the header node, which contains a bitmap, and node_id 1 is
the root node (the root node can change if the tree height increases,
but the initial root node ID of the catalog tree is always 1).
In fact, the root node (ID 1) is the bnode we create when initializing
the catalog btree. However, during the creation of the catalog tree, we
do not create the header node (ID 0).
hfsplus_fill_super()
-> hfsplus_iget(ROOT_CNID)
-> hfs_find_init(catalog_tree)
-> hfsplus_find_cat
-> hfs_brec_read
-> hfs_brec_find(tree->root)
-> hfs_bnode_find(tree->root)
-> __hfs_bnode_create(tree->root)
From my perspective, the header node is somewhat special. There are two
cases where we need to use it. First, when searching the bitmap to
allocate or free a node. Second, when reading or writing metadata in the
header record. In hfs_btree_open(), we do read data from the header
record; however, it reads this data directly from the page rather than
creating a header node.
```
mapping = tree->inode->i_mapping;
page = read_mapping_page(mapping, 0, NULL);
if (IS_ERR(page))
goto free_inode;
/* Load the header */
head = (struct hfs_btree_header_rec *)(kmap_local_page(page) +
sizeof(struct hfs_bnode_desc));
tree->root = be32_to_cpu(head->root);
```
So we lose the opportunity to initialize the header node during the
mounting process. We can only wait until the next time we need to use
the header node to create it. In this scenario, it could be sync() ->
hfs_btree_write to write metadata back to disk, or link() ->
hfs_bnode_split to use the bitmap to allocate a new node.
Besides this, I found hfs_bnode_find() to be somewhat strange.
hfs_bnode_find() does not record the node it creates in the bitmap(e.g.
header node). The recording is handled by hfs_bmap_alloc, meaning that
recording and creation are separated operations, and the two functions
do not have a one-to-one relationship.The caller must ensure that the
node it wants to find is already in the bitmap. However, we do not
currently have such checks in place—or perhaps there is some mechanism
that guarantees every node passed through it is already in the bitmap?
Interestingly, since hfsplus_fill_super() needs to initialize the root
directory “/”, this operation must rely on the catalog_tree. As a
result, the first byte of the catalog_tree bitmap is 0xC0 (i.e.,
0b11000000), meaning that both the header node and the root node are
pre-marked in the bitmap so that hfs_bnode_find can later be used to
create the header node (although in practice, the root node is the first
one created).
```
/* Load the root directory */
root = hfsplus_iget(sb, HFSPLUS_ROOT_CNID); <- use catalog tree
if (IS_ERR(root)) {
pr_err("failed to load root directory\n");
err = PTR_ERR(root);
goto out_put_alloc_file;
}
```
In contrast, the extent tree is not used at all during the mounting
process. As a result, the first byte of the extents tree is 0x80 (i.e.,
0b1000000), so its root node can be created later, when
hfs_brec_insert() performs the first insertion. At that point, the root
node is marked in the bitmap through hfs_bmap_alloc().
In short, during the mounting process, the extents tree initializes
neither the header node nor the root node.
```
if (!fd->bnode) {
if (!tree->root)
hfs_btree_inc_height(tree); <- hfs_bmap_alloc
node = hfs_bnode_find(tree, tree->leaf_head);
if (IS_ERR(node))
return PTR_ERR(node);
fd->bnode = node;
fd->record = -1;
}
```
> So, if you are fixing the syzbot reported issue, then it could be good to have
> call trace in the comment to understand the fix environment. And it could be
> more clear in what environment sync() is trying to create the b-tree node.
OK, I will put call trace in the commit message in future development.
syzbot report call trace:[2][3]
Call Trace:
<TASK>
hfsplus_btree_write+0x379/0x7b0 fs/hfsplus/btree.c:309
hfsplus_system_write_inode fs/hfsplus/super.c:137 [inline]
hfsplus_write_inode+0x4c9/0x5f0 fs/hfsplus/super.c:163
write_inode fs/fs-writeback.c:1525 [inline]
__writeback_single_inode+0x6f4/0x1000 fs/fs-writeback.c:1745
writeback_sb_inodes+0x6b7/0xf60 fs/fs-writeback.c:1976
wb_writeback+0x43b/0xaf0 fs/fs-writeback.c:2156
wb_do_writeback fs/fs-writeback.c:2303 [inline]
wb_workfn+0x40e/0xf00 fs/fs-writeback.c:2343
process_one_work kernel/workqueue.c:3236 [inline]
process_scheduled_works+0xae1/0x17b0 kernel/workqueue.c:3319
worker_thread+0x8a0/0xda0 kernel/workqueue.c:3400
kthread+0x711/0x8a0 kernel/kthread.c:463
ret_from_fork+0x3fc/0x770 arch/x86/kernel/process.c:148
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
Call Trace:
<TASK>
hfsplus_bmap_alloc+0x5a5/0x640 fs/hfsplus/btree.c:414
hfs_bnode_split+0xcc/0xef0 fs/hfsplus/brec.c:245
hfsplus_brec_insert+0x38f/0xcc0 fs/hfsplus/brec.c:100
hfsplus_rename_cat+0x51c/0xde0 fs/hfsplus/catalog.c:491
hfsplus_link+0x359/0x6a0 fs/hfsplus/dir.c:323
vfs_link+0x4ea/0x6e0 fs/namei.c:4854
do_linkat+0x272/0x560 fs/namei.c:4924
__do_sys_link fs/namei.c:4958 [inline]
__se_sys_link fs/namei.c:4956 [inline]
__x64_sys_link+0x82/0x90 fs/namei.c:4956
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
There are two different call traces, depending on which path acquires
the node first.
Finally, I believe it is necessary to add the missing hfs_bnode_get in
__hfs_bnode_create. This ensures that each find call holds a proper
reference, aligning the reference counting design with that of VFS
inodes, and effectively preventing miscounting in concurrent scenarios.
If you would like, I can rewrite the commit message and resend the
patch. Please let me know.
Thanks,
Chenzhi Yang
[1] https://syzkaller.appspot.com/text?tag=ReproC&x=12211dbc580000
[2] https://syzkaller.appspot.com/text?tag=CrashReport&x=1076eda2580000
[3] https://syzkaller.appspot.com/text?tag=CrashReport&x=13cd7682580000
Powered by blists - more mailing lists