lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6531a3b0-915d-8078-b265-95231405ac4d@huawei.com>
Date:   Mon, 25 Sep 2023 23:02:12 +0800
From:   Wenchao Hao <haowenchao2@...wei.com>
To:     "James E . J . Bottomley" <jejb@...ux.ibm.com>,
        "Martin K . Petersen" <martin.petersen@...cle.com>,
        <linux-scsi@...r.kernel.org>
CC:     <linux-kernel@...r.kernel.org>, <louhongxiang@...wei.com>
Subject: Re: [PATCH 1/2] scsi: core: scsi_device_online() return false if
 state is SDEV_CANCEL

On 2023/9/22 17:36, Wenchao Hao wrote:
> SDEV_CANCEL is set when removing device and scsi_device_online() should
> return false if sdev_state is SDEV_CANCEL.
> 
> IO hang would be caused if return true when state is SDEV_CANCEL with
> following order:
> 
> T1:					    T2:scsi_error_handler
> __scsi_remove_device()
>    scsi_device_set_state(sdev, SDEV_CANCEL)
>    					    scsi_eh_flush_done_q()
> 					    if (scsi_device_online(sdev))
> 					      scsi_queue_insert(scmd,...)
> 
> The command added by scsi_queue_insert() would never be handled any
> more.
> 
> Signed-off-by: Wenchao Hao <haowenchao2@...wei.com>
> ---
>   include/scsi/scsi_device.h | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> index 75b2235b99e2..c498a12f7715 100644
> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -517,7 +517,8 @@ static inline int scsi_device_online(struct scsi_device *sdev)
>   {
>   	return (sdev->sdev_state != SDEV_OFFLINE &&
>   		sdev->sdev_state != SDEV_TRANSPORT_OFFLINE &&
> -		sdev->sdev_state != SDEV_DEL);
> +		sdev->sdev_state != SDEV_DEL &&
> +		sdev->sdev_state != SDEV_CANCEL);
>   }
>   static inline int scsi_device_blocked(struct scsi_device *sdev)
>   {

Return false when if sdev_state is SDEV_CANCEL seems change some flow in
error handle, but I don't know if we should introduce these changes.
I think it's both ok to finish the failed command or try more recovery steps.

For example, in scsi_eh_bus_device_reset(), when scsi_try_bus_device_reset()
returned SUCCEED but the sdev_state is SDEV_CANCEL, should skip TUR and just
call scsi_eh_finish_cmd() to add this LUN's error command to done_q?

We can address the issue of IO hang described in this patch by running
scsi_device's queue regardless of the scsi_device's state and it seems
a better solution because the main reason of IO hang is as following:

scsi_restart_operations()
	-> scsi_run_host_queues()
		-> shost_for_each_device() // skip scsi_device with SDEV_DEL
					   // or SDEV_CANCEL state

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ