[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bc738580-4e1f-411f-af7b-f76a4ce7b7ea@linux.alibaba.com>
Date: Fri, 31 Oct 2025 20:25:46 +0800
From: Gao Xiang <hsiangkao@...ux.alibaba.com>
To: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Christoph Hellwig <hch@....de>, Jens Axboe <axboe@...nel.dk>,
Jan Kara <jack@...e.cz>, Christian Brauner <brauner@...nel.org>
Cc: "Michael S. Tsirkin" <mst@...hat.com>, Jason Wang <jasowang@...hat.com>,
"Martin K. Petersen" <martin.petersen@...cle.com>,
Luis Chamberlain <mcgrof@...nel.org>, linux-block@...r.kernel.org,
Joseph Qi <joseph.qi@...ux.alibaba.com>, guanghuifeng@...ux.alibaba.com,
zongyong.wzy@...baba-inc.com, zyfjeff@...ux.alibaba.com,
"Rafael J. Wysocki" <rafael@...nel.org>, Danilo Krummrich <dakr@...nel.org>,
linux-kernel@...r.kernel.org
Subject: Re: question about bd_inode hashing against device_add() // Re:
[PATCH 03/11] block: call bdev_add later in device_add_disk
On 2025/10/31 20:23, Gao Xiang wrote:
>
>
> On 2025/10/31 18:12, Gao Xiang wrote:
>> Hi Greg,
>>
>> On 2025/10/31 17:58, Greg Kroah-Hartman wrote:
>>> On Fri, Oct 31, 2025 at 05:54:10PM +0800, Gao Xiang wrote:
>>>>
>>>>
>>>> On 2025/10/31 17:45, Christoph Hellwig wrote:
>
> ...
>
>>>>> But why does the device node
>>>>> get created earlier? My assumption was that it would only be
>>>>> created by the KOBJ_ADD uevent. Adding the device model maintainers
>>>>> as my little dig through the core drivers/base/ code doesn't find
>>>>> anything to the contrary, but maybe I don't fully understand it.
>>>>
>>>> AFAIK, device_add() is used to trigger devtmpfs file
>>>> creation, and it can be observed if frequently
>>>> hotpluging device in the VM and mount. Currently
>>>> I don't have time slot to build an easy reproducer,
>>>> but I think it's a real issue anyway.
>>>
>>> As I say above, that's not normal, and you have to be root to do this,
> I just spent time to reproduce with dynamic loop devices and
> actually it's easy if msleep() is located artificiallly,
> the diff as below:
>
> diff --git a/block/bdev.c b/block/bdev.c
> index 810707cca970..a4273b5ad456 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -821,7 +821,7 @@ struct block_device *blkdev_get_no_open(dev_t dev, bool autoload)
> struct inode *inode;
>
> inode = ilookup(blockdev_superblock, dev);
> - if (!inode && autoload && IS_ENABLED(CONFIG_BLOCK_LEGACY_AUTOLOAD)) {
> + if (0) {
> blk_request_module(dev);
> inode = ilookup(blockdev_superblock, dev);
> if (inode)
> diff --git a/block/genhd.c b/block/genhd.c
> index 9bbc38d12792..3c9116fdc1ce 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -428,6 +428,8 @@ static void add_disk_final(struct gendisk *disk)
> set_bit(GD_ADDED, &disk->state);
> }
>
> +#include <linux/delay.h>
> +
> static int __add_disk(struct device *parent, struct gendisk *disk,
> const struct attribute_group **groups,
> struct fwnode_handle *fwnode)
> @@ -497,6 +499,9 @@ static int __add_disk(struct device *parent, struct gendisk *disk,
> if (ret)
> goto out_free_ext_minor;
>
> + if (disk->major == LOOP_MAJOR)
> + msleep(2500); // delay 2.5s for all loops
> +
> ret = disk_alloc_events(disk);
> if (ret)
> goto out_device_del;
>
>
> (Note that I masked off CONFIG_BLOCK_LEGACY_AUTOLOAD
> for cleaner ftrace below.)
>
> and then
>
> # uname -a (patched 6.18-rc1 kernel)
>
> ```
> Linux 7e5b4b5f5181 6.18.0-rc1-dirty #25 SMP PREEMPT_DYNAMIC Fri Oct 31 19:52:10 CST 2025 x86_64 GNU/Linux
> ```
>
> # truncate -s 1g test.img; mkfs.ext4 -F test.img;
> # losetup /dev/loop999 test.img & sleep 1; ls -l /dev/loop999; strace mount -t ext4 /dev/loop999 mnt 2>&1 | grep fsconfig
>
> It shows
>
> ```
> brw------- 1 root root 7, 999 Oct 31 20:06 /dev/loop999
> fsconfig(3, FSCONFIG_SET_STRING, "source", "/dev/loop999", 0) = 0
> fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = -1 ENXIO (No such device or address) // unexpected
> ```
>
> then
>
> # losetup /dev/loop996 test.img & sleep 1; stat /dev/loop996; trace-cmd record -p function_graph mount -t ext4 /dev/loop996 mnt &> /dev/null
>
> It shows
> ```
> File: /dev/loop996
> Size: 0 Blocks: 0 IO Block: 4096 block special file
> Device: 0,6 Inode: 429 Links: 1 Device type: 7,996
> Access: (0600/brw-------) Uid: ( 0/ root) Gid: ( 0/ root)
> Access: 2025-10-31 20:07:54.938474868 +0800
> Modify: 2025-10-31 20:07:54.938474868 +0800
> Change: 2025-10-31 20:07:54.938474868 +0800
> Birth: 2025-10-31 20:07:54.938474868 +0800
> ```
>
> but
>
> # trace-cmd report | grep mount | less
> mount-561 [007] ...1. 240.180513: funcgraph_entry: | bdev_file_open_by_dev() {
> mount-561 [007] ...1. 240.180513: funcgraph_entry: | bdev_permission() {
> mount-561 [007] ...1. 240.180513: funcgraph_entry: | devcgroup_check_permission() {
> mount-561 [007] ...1. 240.180513: funcgraph_entry: | __rcu_read_lock() {
> mount-561 [007] ...1. 240.180514: funcgraph_exit: 0.193 us | } (ret=0x1)
> mount-561 [007] ...1. 240.180514: funcgraph_entry: | match_exception_partial() {
> mount-561 [007] ...1. 240.180514: funcgraph_exit: 0.199 us | } (ret=0x0)
> mount-561 [007] ...1. 240.180514: funcgraph_entry: | __rcu_read_unlock() {
> mount-561 [007] ...1. 240.180515: funcgraph_exit: 0.202 us | } (ret=0x0)
> mount-561 [007] ...1. 240.180515: funcgraph_exit: 1.602 us | } (ret=0x0)
> mount-561 [007] ...1. 240.180515: funcgraph_exit: 2.100 us | } (ret=0x0)
> mount-561 [007] ...1. 240.180515: funcgraph_entry: | ilookup() {
> mount-561 [007] ...1. 240.180516: funcgraph_entry: | __cond_resched() {
> mount-561 [007] ...1. 240.180516: funcgraph_exit: 0.194 us | } (ret=0x0)
> mount-561 [007] ...1. 240.180516: funcgraph_entry: | find_inode_fast() {
> mount-561 [007] ...1. 240.180516: funcgraph_entry: | __rcu_read_lock() {
> mount-561 [007] ...1. 240.180516: funcgraph_exit: 0.195 us | } (ret=0x1)
> mount-561 [007] ...1. 240.180517: funcgraph_entry: | __rcu_read_unlock() {
> mount-561 [007] ...1. 240.180517: funcgraph_exit: 0.193 us | } (ret=0x0)
> mount-561 [007] ...1. 240.180517: funcgraph_exit: 1.060 us | } (ret=0x0)
> mount-561 [007] ...1. 240.180517: funcgraph_exit: 1.970 us | } (ret=0x0)
> mount-561 [007] ...1. 240.180518: funcgraph_exit: 4.818 us | } (ret=-6)
>
> here -6 (-ENXIO) is unexpected.
>
> Actually the problematic code path I've said is device_add():
>
> upstream code:
>
> loop_control_ioctl
> loop_add
> add_disk_fwnode
> __add_disk
> devtmpfs_create_node // here create devtmpfs blkdev file, but racy
> add_disk_final
> bdev_add
> insert_inode_hash // just seen by bdev_file_open_by_dev()
> disk_uevent(disk, KOBJ_ADD)
minor revision:
loop_control_ioctl
loop_add
add_disk_fwnode
__add_disk
device_add
devtmpfs_create_node // here create devtmpfs blkdev file, but racy
add_disk_final
bdev_add
insert_inode_hash // just seen by bdev_file_open_by_dev()
disk_uevent(disk, KOBJ_ADD)
>
> I actually think it's enough to explain the root.
>
> Thanks,
> Gao Xiang
Powered by blists - more mailing lists