[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2fa67edb-7cf2-e6bb-a2ab-425911226fbb@huawei.com>
Date: Tue, 15 Aug 2023 22:08:31 +0800
From: "haowenchao (C)" <haowenchao2@...wei.com>
To: "James E . J . Bottomley" <jejb@...ux.ibm.com>,
"Martin K . Petersen" <martin.petersen@...cle.com>,
Hannes Reinecke <hare@...e.de>, <linux-scsi@...r.kernel.org>,
<linux-kernel@...r.kernel.org>
CC: Dan Carpenter <error27@...il.com>, <louhongxiang@...wei.com>
Subject: Re: [PATCH 00/13] scsi: Support LUN/target based error handle
On 2023/7/24 7:44, Wenchao Hao wrote:
> The origin error handle would set host to recovery state and perform
> error recovery operations, and makes all LUNs which share a same host
> can not handle IOs. This phenomenon is unbearable for systems which
> deploy many LUNs in one HBA.
>
> This patchset introduce support for LUN/target based error handle,
> drivers can chose if to implement it. They can implement LUN, target or
> both of LUN and target based error handle by their own error handle
> strategy. The first patch defined this framework, it abstract three
> key operations which are: add error command, wake up error handle, block
> ios when error command is added and recoverying. Drivers should
> implement these three function callbacks and setup to SCSI middle level.
>
> Besides the basic framework, this patchset also add a basic LUN/target
> based error handle strategy.
>
> For LUN based eh, it would try check sense, start unit and reset LUN,
> if all above steps can not recovery all error commands, fallback to
> further recovery like tartget based (if implemented) or host based error
> handle.
>
> It's same for tartget based eh, it would try check sense, start unit,
> reset LUN and reset target. If all above steps can not recovery all error
> commands, fallback to further recovery which is host based error handle.
>
> This patchset is tested by scsi_debug which support single LUN error
> injection, the scsi_debug patches is here:
>
> https://lore.kernel.org/linux-scsi/20230723234105.1628982-1-haowenchao2@huawei.com/T/#t
>
I tested this patch set with scsi_debug with following scenarios, check
attachments to get my test script and result logs.
+-----------+---------+-------------------------------------------------------+
| lun reset | TUR | Desired result |
+ --------- + ------- + ------------------------------------------------------+
| success | success | retry or finish with EIO(may offline disk) |
+ --------- + ------- + ------------------------------------------------------+
| success | fail | fallback to host recovery, retry or finish with |
| | | EIO(may offline disk) |
+ --------- + ------- + ------------------------------------------------------+
| fail | NA | fallback to host recovery, retry or finish with |
| | | EIO(may offline disk) |
+ --------- + ------- + ------------------------------------------------------+
+-----------+---------+--------------+---------+------------------------------+
| lun reset | TUR | target reset | TUR | Desired result |
+-----------+---------+--------------+---------+------------------------------+
| success | success | NA | NA | retry or finish with |
| | | | | EIO(may offline disk) |
+-----------+---------+--------------+---------+------------------------------+
| success | fail | success | success | retry or finish with |
| | | | | EIO(may offline disk) |
+-----------+---------+--------------+---------+------------------------------+
| fail | NA | success | success | retry or finish with |
| | | | | EIO(may offline disk) |
+-----------+---------+--------------+---------+------------------------------+
| fail | NA | success | fail | fallback to host recovery, |
| | | | | retry or finish with EIO(may |
| | | | | offline disk) |
+-----------+---------+--------------+---------+------------------------------+
| fail | NA | fail | NA | fallback to host recovery, |
| | | | | retry or finish with EIO(may |
| | | | | offline disk) |
+-----------+---------+--------------+---------+------------------------------+
+-----------+---------+--------------+---------+------------------------------+
| lun reset | TUR | target reset | TUR | Desired result |
+-----------+---------+--------------+---------+------------------------------+
| success | success | NA | NA | retry or finish with |
| | | | | EIO(may offline disk) |
+-----------+---------+--------------+---------+------------------------------+
| success | fail | success | success | lun recovery fallback to |
| | | | | target recovery, retry or |
| | | | | finish with EIO(may offline |
| | | | | disk |
+-----------+---------+--------------+---------+------------------------------+
| fail | NA | success | success | lun recovery fallback to |
| | | | | target recovery, retry or |
| | | | | finish with EIO(may offline |
| | | | | disk |
+-----------+---------+--------------+---------+------------------------------+
| fail | NA | success | fail | lun recovery fallback to |
| | | | | target recovery, then fall |
| | | | | back to host recovery, retry |
| | | | | or fhinsi with EIO(may |
| | | | | offline disk) |
+-----------+---------+--------------+---------+------------------------------+
| fail | NA | fail | NA | lun recovery fallback to |
| | | | | target recovery, then fall |
| | | | | back to host recovery, retry |
| | | | | or fhinsi with EIO(may |
| | | | | offline disk) |
+-----------+---------+--------------+---------+------------------------------+
> Wenchao Hao (13):
> scsi: Define basic framework for driver LUN/target based error handle
> scsi:scsi_error: Move complete variable eh_action from shost to sdevice
> scsi:scsi_error: Check if to do reset in scsi_try_xxx_reset
> scsi:scsi_error: Add helper scsi_eh_sdev_stu to do START_UNIT
> scsi:scsi_error: Add helper scsi_eh_sdev_reset to do lun reset
> scsi:scsi_error: Add flags to mark error handle steps has done
> scsi:scsi_error: Define helper to perform LUN based error handle
> scsi:scsi_error: Add LUN based error handler based previous helper
> scsi:core: increase/decrease target_busy without check can_queue
> scsi:scsi_error: Define helper to perform target based error handle
> scsi:scsi_error: Add target based error handler based previous helper
> scsi:scsi_debug: Add param to control if setup LUN based error handle
> scsi:scsi_debug: Add param to control if setup target based error handle
>
> drivers/scsi/scsi_debug.c | 19 +
> drivers/scsi/scsi_error.c | 705 ++++++++++++++++++++++++++++++++++---
> drivers/scsi/scsi_lib.c | 23 +-
> drivers/scsi/scsi_priv.h | 20 ++
> include/scsi/scsi_device.h | 97 +++++
> include/scsi/scsi_eh.h | 4 +
> include/scsi/scsi_host.h | 2 -
> 7 files changed, 813 insertions(+), 57 deletions(-)
>
Download attachment "logs.tar.gz" of type "application/x-gzip" (7681 bytes)
View attachment "test.sh" of type "text/plain" (6362 bytes)
Powered by blists - more mailing lists