linux-kernel - Re: [PATCH v5 1/3] scsi: core: Add new helper to iterate all devices of host

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c1ecd21b-7b97-4ef6-94a1-86b2cf520a67@gmail.com>
Date: Wed, 12 Jun 2024 23:06:31 +0800
From: Wenchao Hao <haowenchao22@...il.com>
To: Hannes Reinecke <hare@...e.de>,
 "James E . J . Bottomley" <James.Bottomley@...senPartnership.com>,
 "Martin K . Petersen" <martin.petersen@...cle.com>,
 linux-scsi@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 1/3] scsi: core: Add new helper to iterate all devices
 of host

On 6/12/24 4:33 PM, Hannes Reinecke wrote:
> On 6/5/24 11:17, Wenchao Hao wrote:
>> shost_for_each_device() would skip devices which is in SDEV_CANCEL or
>> SDEV_DEL state, for some scenarios, we donot want to skip these devices,
>> so add a new macro shost_for_each_device_include_deleted() to handle it.
>>
>> Following changes are introduced:
>>
>> 1. Rework scsi_device_get(), add new helper __scsi_device_get() which
>>     determine if skip deleted scsi_device by parameter "skip_deleted".
>> 2. Add new parameter "skip_deleted" to __scsi_iterate_devices() which
>>     is used when calling __scsi_device_get()
>> 3. Update shost_for_each_device() to call __scsi_iterate_devices() with
>>     "skip_deleted" true
>> 4. Add new macro shost_for_each_device_include_deleted() which call
>>     __scsi_iterate_devices() with "skip_deleted" false
>>
>> Signed-off-by: Wenchao Hao <haowenchao22@...il.com>
>> ---
>>   drivers/scsi/scsi.c        | 46 ++++++++++++++++++++++++++------------
>>   include/scsi/scsi_device.h | 25 ++++++++++++++++++---
>>   2 files changed, 54 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
>> index 3e0c0381277a..5913de543d93 100644
>> --- a/drivers/scsi/scsi.c
>> +++ b/drivers/scsi/scsi.c
>> @@ -735,20 +735,18 @@ int scsi_cdl_enable(struct scsi_device *sdev, bool enable)
>>       return 0;
>>   }
>>   -/**
>> - * scsi_device_get  -  get an additional reference to a scsi_device
>> +/*
>> + * __scsi_device_get  -  get an additional reference to a scsi_device
>>    * @sdev:    device to get a reference to
>> - *
>> - * Description: Gets a reference to the scsi_device and increments the use count
>> - * of the underlying LLDD module.  You must hold host_lock of the
>> - * parent Scsi_Host or already have a reference when calling this.
>> - *
>> - * This will fail if a device is deleted or cancelled, or when the LLD module
>> - * is in the process of being unloaded.
>> + * @skip_deleted: when true, would return failed if device is deleted
>>    */
>> -int scsi_device_get(struct scsi_device *sdev)
>> +static int __scsi_device_get(struct scsi_device *sdev, bool skip_deleted)
>>   {
>> -    if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL)
>> +    /*
>> +     * if skip_deleted is true and device is in removing, return failed
>> +     */
>> +    if (skip_deleted &&
>> +        (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL))
>>           goto fail;
> 
> Nack.
> SDEV_DEL means the device is about to be deleted, so we _must not_ access it at all.
> 

Sorry I added SDEV_DEL here at hand without understanding what it means.
Actually, just include scsi_device which is in SDEV_CANCEL would fix the
issues I described.

The issues are because device removing concurrent with error handle.
Normally, error handle would not be triggered when scsi_device is in
SDEV_DEL. Below is my analysis, if it is wrong, please correct me.

If there are scsi_cmnd remain unfinished when removing scsi_device,
the removing process would waiting for all commands to be finished.
If commands error happened and trigger error handle, the removing
process would be blocked until error handle finished, because
__scsi_remove_device called  del_gendisk() which would wait all
requests to be finished. So now scsi_device is in SDEV_CANCEL.

If the scsi_device is already in SDEV_DEL, then no scsi_cmnd has been
dispatched to this scsi_device, then error handle would never triggered.

I want to change the new function __scsi_device_get() as following,
please help to review.

/*
 * __scsi_device_get  -  get an additional reference to a scsi_device
 * @sdev:	device to get a reference to
 * @skip_canceled: when true, would return failed if device is deleted
 */
static int __scsi_device_get(struct scsi_device *sdev, bool skip_canceled)
{
	/*
	 * if skip_canceled is true and device is in removing, return failed
	 */
	if (sdev->sdev_state == SDEV_DEL ||
	    (sdev->sdev_state == SDEV_CANCEL && skip_canceled))
		goto fail;
	if (!try_module_get(sdev->host->hostt->module))
		goto fail;
	if (!get_device(&sdev->sdev_gendev))
		goto fail_put_module;
	return 0;

fail_put_module:
	module_put(sdev->host->hostt->module);
fail:
	return -ENXIO;
}

> Cheers,
> 
> Hannes