linux-kernel - Re: question about bd_inode hashing against device_add() // Re: [PATCH 03/11] block: call bdev_add later in device_add

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <cda14c54-4c26-4932-99ea-35212a1479e4@linux.alibaba.com>
Date: Wed, 5 Nov 2025 22:13:02 +0800
From: Gao Xiang <hsiangkao@...ux.alibaba.com>
To: Christian Brauner <brauner@...nel.org>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 Christoph Hellwig <hch@....de>, Jens Axboe <axboe@...nel.dk>,
 Jan Kara <jack@...e.cz>, "Michael S. Tsirkin" <mst@...hat.com>,
 Jason Wang <jasowang@...hat.com>,
 "Martin K. Petersen" <martin.petersen@...cle.com>,
 Luis Chamberlain <mcgrof@...nel.org>, linux-block@...r.kernel.org,
 Joseph Qi <joseph.qi@...ux.alibaba.com>, guanghuifeng@...ux.alibaba.com,
 zongyong.wzy@...baba-inc.com, zyfjeff@...ux.alibaba.com,
 "Rafael J. Wysocki" <rafael@...nel.org>, Danilo Krummrich <dakr@...nel.org>,
 linux-kernel@...r.kernel.org
Subject: Re: question about bd_inode hashing against device_add() // Re:
 [PATCH 03/11] block: call bdev_add later in device_add_disk

Hi Christian,

On 2025/11/5 20:30, Christian Brauner wrote:
> On Fri, Oct 31, 2025 at 10:44:53PM +0800, Gao Xiang wrote:
>>
>>
>> On 2025/10/31 22:34, Greg Kroah-Hartman wrote:
>>> On Fri, Oct 31, 2025 at 08:23:32PM +0800, Gao Xiang wrote:
>>>>
>>>>
>>>> On 2025/10/31 18:12, Gao Xiang wrote:
>>>>> Hi Greg,
>>>>>
>>>>> On 2025/10/31 17:58, Greg Kroah-Hartman wrote:
>>>>>> On Fri, Oct 31, 2025 at 05:54:10PM +0800, Gao Xiang wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2025/10/31 17:45, Christoph Hellwig wrote:
>>>>
>>>> ...
>>>>
>>>>>>>> But why does the device node
>>>>>>>> get created earlier?  My assumption was that it would only be
>>>>>>>> created by the KOBJ_ADD uevent.  Adding the device model maintainers
>>>>>>>> as my little dig through the core drivers/base/ code doesn't find
>>>>>>>> anything to the contrary, but maybe I don't fully understand it.
>>>>>>>
>>>>>>> AFAIK, device_add() is used to trigger devtmpfs file
>>>>>>> creation, and it can be observed if frequently
>>>>>>> hotpluging device in the VM and mount.  Currently
>>>>>>> I don't have time slot to build an easy reproducer,
>>>>>>> but I think it's a real issue anyway.
>>>>>>
>>>>>> As I say above, that's not normal, and you have to be root to do this,
>>>> I just spent time to reproduce with dynamic loop devices and
>>>> actually it's easy if msleep() is located artificiallly,
>>>> the diff as below:
>>>>
>>>> diff --git a/block/bdev.c b/block/bdev.c
>>>> index 810707cca970..a4273b5ad456 100644
>>>> --- a/block/bdev.c
>>>> +++ b/block/bdev.c
>>>> @@ -821,7 +821,7 @@ struct block_device *blkdev_get_no_open(dev_t dev, bool autoload)
>>>>    	struct inode *inode;
>>>>
>>>>    	inode = ilookup(blockdev_superblock, dev);
>>>> -	if (!inode && autoload && IS_ENABLED(CONFIG_BLOCK_LEGACY_AUTOLOAD)) {
>>>> +	if (0) {
>>>>    		blk_request_module(dev);
>>>>    		inode = ilookup(blockdev_superblock, dev);
>>>>    		if (inode)
>>>> diff --git a/block/genhd.c b/block/genhd.c
>>>> index 9bbc38d12792..3c9116fdc1ce 100644
>>>> --- a/block/genhd.c
>>>> +++ b/block/genhd.c
>>>> @@ -428,6 +428,8 @@ static void add_disk_final(struct gendisk *disk)
>>>>    	set_bit(GD_ADDED, &disk->state);
>>>>    }
>>>>
>>>> +#include <linux/delay.h>
>>>> +
>>>>    static int __add_disk(struct device *parent, struct gendisk *disk,
>>>>    		      const struct attribute_group **groups,
>>>>    		      struct fwnode_handle *fwnode)
>>>> @@ -497,6 +499,9 @@ static int __add_disk(struct device *parent, struct gendisk *disk,
>>>>    	if (ret)
>>>>    		goto out_free_ext_minor;
>>>>
>>>> +	if (disk->major == LOOP_MAJOR)
>>>> +		msleep(2500);           // delay 2.5s for all loops
>>>> +
>>>
>>> Yes, so you need to watch for the uevent to happen, THEN it is safe to
>>> access the block device.  Doing it before then isn't a good idea :)
>>>
>>> But, if you think this is an issue, do you have a patch that passes your
>>> testing to fix it?
>>
>> I just raise it up for some ideas, and this change is
>> buried into the code refactor and honestly I need to
>> look into the codebase and related patchsets first.
>>
>> Currently I have dozens of other development stuffs
>> on hand, if it's really a regression, I do hope
>> Christoph or other folks who are familiar with the code
>> could try to address this.
> 
> If it's easy to do without much of a regression or performance risk then
> the device node should only show up once the device is actually ready.

Yeah, agreed.

> It's certainly best-practive to wait for the uevent though.

Currently our internal applications will try to adapt uevent
detection too, but as a public cloud provider, we may still
need a fallback to avoid users' potential blame on this,
anyway.

Thanks,
Gao Xiang