lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <893184b5-57a8-4ba8-b923-614978a4c1be@linux.alibaba.com>
Date: Wed, 5 Nov 2025 11:04:06 +0800
From: Gao Xiang <hsiangkao@...ux.alibaba.com>
To: Christoph Hellwig <hch@....de>, Jens Axboe <axboe@...nel.dk>,
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Jan Kara <jack@...e.cz>,
 Christian Brauner <brauner@...nel.org>
Cc: "Michael S. Tsirkin" <mst@...hat.com>, Jason Wang <jasowang@...hat.com>,
 "Martin K. Petersen" <martin.petersen@...cle.com>,
 Luis Chamberlain <mcgrof@...nel.org>, linux-block@...r.kernel.org,
 Joseph Qi <joseph.qi@...ux.alibaba.com>, guanghuifeng@...ux.alibaba.com,
 zongyong.wzy@...baba-inc.com, zyfjeff@...ux.alibaba.com,
 "Rafael J. Wysocki" <rafael@...nel.org>, Danilo Krummrich <dakr@...nel.org>,
 linux-kernel@...r.kernel.org
Subject: Re: question about bd_inode hashing against device_add() // Re:
 [PATCH 03/11] block: call bdev_add later in device_add_disk

Hi Christiph,

On 2025/10/31 22:44, Gao Xiang wrote:
> 

..

>>> I just spent time to reproduce with dynamic loop devices and
>>> actually it's easy if msleep() is located artificiallly,
>>> the diff as below:
>>>
>>> diff --git a/block/bdev.c b/block/bdev.c
>>> index 810707cca970..a4273b5ad456 100644
>>> --- a/block/bdev.c
>>> +++ b/block/bdev.c
>>> @@ -821,7 +821,7 @@ struct block_device *blkdev_get_no_open(dev_t dev, bool autoload)
>>>       struct inode *inode;
>>>
>>>       inode = ilookup(blockdev_superblock, dev);
>>> -    if (!inode && autoload && IS_ENABLED(CONFIG_BLOCK_LEGACY_AUTOLOAD)) {
>>> +    if (0) {
>>>           blk_request_module(dev);
>>>           inode = ilookup(blockdev_superblock, dev);
>>>           if (inode)
>>> diff --git a/block/genhd.c b/block/genhd.c
>>> index 9bbc38d12792..3c9116fdc1ce 100644
>>> --- a/block/genhd.c
>>> +++ b/block/genhd.c
>>> @@ -428,6 +428,8 @@ static void add_disk_final(struct gendisk *disk)
>>>       set_bit(GD_ADDED, &disk->state);
>>>   }
>>>
>>> +#include <linux/delay.h>
>>> +
>>>   static int __add_disk(struct device *parent, struct gendisk *disk,
>>>                 const struct attribute_group **groups,
>>>                 struct fwnode_handle *fwnode)
>>> @@ -497,6 +499,9 @@ static int __add_disk(struct device *parent, struct gendisk *disk,
>>>       if (ret)
>>>           goto out_free_ext_minor;
>>>
>>> +    if (disk->major == LOOP_MAJOR)
>>> +        msleep(2500);           // delay 2.5s for all loops
>>> +
>>
>> Yes, so you need to watch for the uevent to happen, THEN it is safe to
>> access the block device.  Doing it before then isn't a good idea :)
>>
>> But, if you think this is an issue, do you have a patch that passes your
>> testing to fix it?
> 
> I just raise it up for some ideas, and this change is
> buried into the code refactor and honestly I need to
> look into the codebase and related patchsets first.
> 
> Currently I have dozens of other development stuffs
> on hand, if it's really a regression, I do hope
> Christoph or other folks who are familiar with the code
> could try to address this.

I've provided a reproducible way:
https://lore.kernel.org/linux-block/ec8b1c76-c211-49a5-a056-6a147faddd3b@linux.alibaba.com

As the author of these gendisk/bdev enhancement commits, what's
your opinion on this?

In other words, do you think it's a regression, or just a behavior
change but not a regression? Also, a minor confirmation:
if it is a regression on your side, would you like to address it?

Due to further code changes, I proposed a temporary workaround
for our 6.6 kernels as below (I don't think it's clean but we will
do more tests), but due to limited time, currently I don't have
time to come up with a cleaner solution and track this until the
upstream fix lands.

Thanks,
Gao Xiang

  block/blk.h             |  1 +
  block/genhd.c           | 18 ++++++++++++++++--
  block/partitions/core.c |  6 +++++-
  3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/block/blk.h b/block/blk.h
index 475bbb40bb83..4410ae9da378 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -419,6 +419,7 @@ static inline int blkdev_zone_mgmt_ioctl(struct block_device *bdev,
  #endif /* CONFIG_BLK_DEV_ZONED */
  
  struct block_device *bdev_alloc(struct gendisk *disk, u8 partno);
+void bdev_inode_failed(struct block_device *bdev);
  void bdev_add(struct block_device *bdev, dev_t dev);
  
  int blk_alloc_ext_minor(void);
diff --git a/block/genhd.c b/block/genhd.c
index 039e7c17523b..cb4313a7c618 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -383,6 +383,14 @@ int disk_scan_partitions(struct gendisk *disk, blk_mode_t mode)
  	return ret;
  }
  
+void bdev_inode_failed(struct block_device *bdev)
+{
+	struct inode *inode = bdev->bd_inode;
+
+	make_bad_inode(inode);
+	unlock_new_inode(inode);
+}
+
  /**
   * device_add_disk - add disk information to kernel list
   * @parent: parent device for the disk
@@ -452,8 +460,12 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk,
  	ddev->parent = parent;
  	ddev->groups = groups;
  	dev_set_name(ddev, "%s", disk->disk_name);
-	if (!(disk->flags & GENHD_FL_HIDDEN))
+	if (!(disk->flags & GENHD_FL_HIDDEN)) {
  		ddev->devt = MKDEV(disk->major, disk->first_minor);
+		disk->part0->bd_inode->i_state |= I_NEW;
+		bdev_add(disk->part0, ddev->devt);
+	}
+
  	ret = device_add(ddev);
  	if (ret)
  		goto out_free_ext_minor;
@@ -505,7 +517,7 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk,
  		if (get_capacity(disk) && disk_has_partscan(disk))
  			set_bit(GD_NEED_PART_SCAN, &disk->state);
  
-		bdev_add(disk->part0, ddev->devt);
+		unlock_new_inode(disk->part0->bd_inode);
  		if (get_capacity(disk))
  			disk_scan_partitions(disk, BLK_OPEN_READ);
  
@@ -546,6 +558,8 @@ int __must_check device_add_disk(struct device *parent, struct gendisk *disk,
  out_device_del:
  	device_del(ddev);
  out_free_ext_minor:
+	if (!(disk->flags & GENHD_FL_HIDDEN))
+		bdev_inode_failed(disk->part0);
  	if (disk->major == BLOCK_EXT_MAJOR)
  		blk_free_ext_minor(disk->first_minor);
  out_exit_elevator:
diff --git a/block/partitions/core.c b/block/partitions/core.c
index 549ce89a657b..c69e369955b9 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -376,6 +376,9 @@ static struct block_device *add_partition(struct gendisk *disk, int partno,
  			goto out_put;
  	}
  
+	bdev->bd_inode->i_state |= I_NEW;
+	bdev_add(bdev, devt);
+
  	/* delay uevent until 'holders' subdir is created */
  	dev_set_uevent_suppress(pdev, 1);
  	err = device_add(pdev);
@@ -398,7 +401,7 @@ static struct block_device *add_partition(struct gendisk *disk, int partno,
  	err = xa_insert(&disk->part_tbl, partno, bdev, GFP_KERNEL);
  	if (err)
  		goto out_del;
-	bdev_add(bdev, devt);
+	unlock_new_inode(bdev->bd_inode);
  
  	/* suppress uevent if the disk suppresses it */
  	if (!dev_get_uevent_suppress(ddev))
@@ -409,6 +412,7 @@ static struct block_device *add_partition(struct gendisk *disk, int partno,
  	kobject_put(bdev->bd_holder_dir);
  	device_del(pdev);
  out_put:
+	bdev_inode_failed(bdev);
  	put_device(pdev);
  	return ERR_PTR(err);
  out_put_disk:
-- 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ