linux-kernel - Re: [PATCH-next v2 2/2] scsi: fix iscsi rescan fails to create block device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <19ad8dd7-482e-dad0-8465-f78f7f9c154d@huaweicloud.com>
Date:   Mon, 30 Jan 2023 11:46:20 +0800
From:   Yu Kuai <yukuai1@...weicloud.com>
To:     jejb@...ux.ibm.com, Yu Kuai <yukuai1@...weicloud.com>,
        Zhong Jinghua <zhongjinghua@...wei.com>,
        gregkh@...uxfoundation.org, martin.petersen@...cle.com,
        hare@...e.de, bvanassche@....org, emilne@...hat.com
Cc:     linux-kernel@...r.kernel.org, linux-scsi@...r.kernel.org,
        yi.zhang@...wei.com, "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: [PATCH-next v2 2/2] scsi: fix iscsi rescan fails to create block
 device

Hi,

在 2023/01/30 11:29, James Bottomley 写道:
> On Mon, 2023-01-30 at 11:07 +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2023/01/30 1:30, James Bottomley 写道:
>>> On Sat, 2023-01-28 at 17:41 +0800, Zhong Jinghua wrote:
>>>> This error will cause a warning:
>>>> kobject_add_internal failed for block (error: -2 parent:
>>>> 1:0:0:1). In the lower version (such as 5.10), there is no
>>>> corresponding error handling, continuing to go down will trigger
>>>> a kernel panic, so cc stable.
>>>
>>> Is this is important point and what you're saying is that this only
>>> panics on kernels before 5.10 or so because after that it's
>>> correctly failed by block device error handling so there's nothing
>>> to fix in later kernels?
>>>
>>> In that case, isn't the correct fix to look at backporting the
>>> block device error handling:
>>
>> This is the last commit that support error handling, and there are
>> many relied patches, and there are lots of refactor in block layer.
>> It's not a good idea to backport error handling to lower version.
>>
>> Althrough error handling can prevent kernel crash in this case, I
>> still think it make sense to make sure kobject is deleted in order,
>> parent should not be deleted before child.
> 
> Well, look, you've created a very artificial situation where a create
> closely followed by a delete of the underlying sdev races with the
> create of the block gendisk devices of sd that bind asynchronously to
> the created sdev.  The asynchronous nature of the bind gives the
> elongated race window so the only real fix is some sort of check that
> the sdev is still viable by the time the bind occurs ... probably in
> sd_probe(), say a scsi_device_get of sdp at the top which would ensure
> viability of the sdev for the entire bind or fail the probe if the sdev
> can't be got.

Sorry, I don't follow here. 😟

I agree this is a very artificial situation, however I can't tell our
tester not to test this way...

The problem is that kobject session is deleted and then sd_probe() tries
to create a new kobject under hostx/sessionx/x:x:x:x/. I don't see how
scsi_device_get() can prevent that, it only get a kobject reference and
can prevent kobject to be released, however, kobject_del() can still be
done.

In this patch, we make sure remove session and sd_probe() won't
concurrent, remove session will wait for all child kobject to be
deleted, what do you think?

Thanks,
Kuai
> 
> James
> 
> 
> .
>