linux-kernel - Re: [PATCH] block: fix kobject double initialization in add

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1aa629f8-88d3-4e1b-9e96-003959809fa1@huaweicloud.com>
Date: Thu, 7 Aug 2025 21:44:19 +0800
From: Zheng Qixing <zhengqixing@...weicloud.com>
To: Nilay Shroff <nilay@...ux.ibm.com>, axboe@...nel.dk
Cc: linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
 yukuai3@...wei.com, yi.zhang@...wei.com, yangerkun@...wei.com,
 houtao1@...wei.com, zhengqixing@...wei.com
Subject: Re: [PATCH] block: fix kobject double initialization in add_disk

Hi,


在 2025/8/7 19:47, Nilay Shroff 写道:
>
> On 8/7/25 12:50 PM, Zheng Qixing wrote:
>> From: Zheng Qixing <zhengqixing@...wei.com>
>>
>> Device-mapper can call add_disk() multiple times for the same gendisk
>> due to its two-phase creation process (dm create + dm load). This leads
>> to kobject double initialization errors when the underlying iSCSI devices
>> become temporarily unavailable and then reappear.
>>
>> However, if the first add_disk() call fails and is retried, the queue_kobj
>> gets initialized twice, causing:
>>
>> kobject: kobject (ffff88810c27bb90): tried to init an initialized object,
>> something is seriously wrong.
>>   Call Trace:
>>    <TASK>
>>    dump_stack_lvl+0x5b/0x80
>>    kobject_init.cold+0x43/0x51
>>    blk_register_queue+0x46/0x280
>>    add_disk_fwnode+0xb5/0x280
>>    dm_setup_md_queue+0x194/0x1c0
>>    table_load+0x297/0x2d0
>>    ctl_ioctl+0x2a2/0x480
>>    dm_ctl_ioctl+0xe/0x20
>>    __x64_sys_ioctl+0xc7/0x110
>>    do_syscall_64+0x72/0x390
>>    entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>
>> Fix this by separating kobject initialization from sysfs registration:
>>   - Initialize queue_kobj early during gendisk allocation
>>   - add_disk() only adds the already-initialized kobject to sysfs
>>   - del_gendisk() removes from sysfs but doesn't destroy the kobject
>>   - Final cleanup happens when the disk is released
>>
>> Fixes: 2bd85221a625 ("block: untangle request_queue refcounting from sysfs")
>> Reported-by: Li Lingfeng <lilingfeng3@...wei.com>
>> Closes: https://lore.kernel.org/all/83591d0b-2467-433c-bce0-5581298eb161@huawei.com/
>> Signed-off-by: Zheng Qixing <zhengqixing@...wei.com>
>> ---
>>   block/blk-sysfs.c | 4 +---
>>   block/blk.h       | 1 +
>>   block/genhd.c     | 2 ++
>>   3 files changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
>> index 396cded255ea..37d8654faff9 100644
>> --- a/block/blk-sysfs.c
>> +++ b/block/blk-sysfs.c
>> @@ -847,7 +847,7 @@ static void blk_queue_release(struct kobject *kobj)
>>   	/* nothing to do here, all data is associated with the parent gendisk */
>>   }
>>   
>> -static const struct kobj_type blk_queue_ktype = {
>> +const struct kobj_type blk_queue_ktype = {
>>   	.default_groups = blk_queue_attr_groups,
>>   	.sysfs_ops	= &queue_sysfs_ops,
>>   	.release	= blk_queue_release,
>> @@ -875,7 +875,6 @@ int blk_register_queue(struct gendisk *disk)
>>   	struct request_queue *q = disk->queue;
>>   	int ret;
>>   
>> -	kobject_init(&disk->queue_kobj, &blk_queue_ktype);
>>   	ret = kobject_add(&disk->queue_kobj, &disk_to_dev(disk)->kobj, "queue");
>>   	if (ret < 0)
>>   		goto out_put_queue_kobj;
> If the kobject_add() fails here, then we jump to the label out_put_queue_kobj,
> where we release/put disk->queue_kobj. That would decrement the kref of
> disk->queue_kobj and possibly bring it to zero.


Since we remove the kobject_init() into alloc disk, when the 
kobject_add() fails here,

it should return without kobject_del/put().


If kobject_add() succeeds but later steps fail, we should call 
kobject_del() to rollback.


The current error handling with kobject_put() in blk_register_queue() is 
indeed problematic.


> Next time, when we call add_disk() again without invoking kobject_init()
> (because the initialization is now moved outside add_disk()), the refcount
> of disk->queue_kobj — which was previously released — would now go for a
> toss. Wouldn't that lead to use-after-free or inconsistent state?
>
>> @@ -986,5 +985,4 @@ void blk_unregister_queue(struct gendisk *disk)
>>   		elevator_set_none(q);
>>   
>>   	blk_debugfs_remove(disk);
>> -	kobject_put(&disk->queue_kobj);
>>   }
> I'm thinking a case where add_disk() fails after the queue is registered.
> In that case, we call blk_unregister_queue() — which would ideally put()
> the disk->queue_kobj.
> But if we skip that put() in blk_unregister_queue() (and that's what we do
> above), and then later retry add_disk(), wouldn’t kobject_add() from
> blk_register_queue() complain loudly — since we’re trying to add a kobject
> that was already added previously?


blk_unregister_queue() calls kobject_del(), then the sysfs state is 
properly cleaned up

and retry should work fine.


>
> Thanks,
> --Nilay


Thanks,

Qixing