lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2fa67edb-7cf2-e6bb-a2ab-425911226fbb@huawei.com>
Date:   Tue, 15 Aug 2023 22:08:31 +0800
From:   "haowenchao (C)" <haowenchao2@...wei.com>
To:     "James E . J . Bottomley" <jejb@...ux.ibm.com>,
        "Martin K . Petersen" <martin.petersen@...cle.com>,
        Hannes Reinecke <hare@...e.de>, <linux-scsi@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>
CC:     Dan Carpenter <error27@...il.com>, <louhongxiang@...wei.com>
Subject: Re: [PATCH 00/13] scsi: Support LUN/target based error handle

On 2023/7/24 7:44, Wenchao Hao wrote:
> The origin error handle would set host to recovery state and perform
> error recovery operations, and makes all LUNs which share a same host
> can not handle IOs. This phenomenon is unbearable for systems which
> deploy many LUNs in one HBA.
> 
> This patchset introduce support for LUN/target based error handle,
> drivers can chose if to implement it. They can implement LUN, target or
> both of LUN and target based error handle by their own error handle
> strategy. The first patch defined this framework, it abstract three
> key operations which are: add error command, wake up error handle, block
> ios when error command is added and recoverying. Drivers should
> implement these three function callbacks and setup to SCSI middle level.
> 
> Besides the basic framework, this patchset also add a basic LUN/target
> based error handle strategy.
> 
> For LUN based eh, it would try check sense, start unit and reset LUN,
> if all above steps can not recovery all error commands, fallback to
> further recovery like tartget based (if implemented) or host based error
> handle.
> 
> It's same for tartget based eh, it would try check sense, start unit,
> reset LUN and reset target. If all above steps can not recovery all error
> commands, fallback to further recovery which is host based error handle.
> 
> This patchset is tested by scsi_debug which support single LUN error
> injection, the scsi_debug patches is here:
> 
> https://lore.kernel.org/linux-scsi/20230723234105.1628982-1-haowenchao2@huawei.com/T/#t
> 

I tested this patch set with scsi_debug with following scenarios, check
attachments to get my test script and result logs.

+-----------+---------+-------------------------------------------------------+
| lun reset | TUR     | Desired result                                        |
+ --------- + ------- + ------------------------------------------------------+
| success   | success | retry or finish with  EIO(may offline disk)           |
+ --------- + ------- + ------------------------------------------------------+
| success   | fail    | fallback to host  recovery, retry or finish with      |
|           |         | EIO(may offline disk)                                 |
+ --------- + ------- + ------------------------------------------------------+
| fail      | NA      | fallback to host  recovery, retry or finish with      |
|           |         | EIO(may offline disk)                                 |
+ --------- + ------- + ------------------------------------------------------+

+-----------+---------+--------------+---------+------------------------------+
| lun reset | TUR     | target reset | TUR     | Desired result               |
+-----------+---------+--------------+---------+------------------------------+
| success   | success | NA           | NA      | retry or finish with         |
|           |         |              |         | EIO(may offline disk)        |
+-----------+---------+--------------+---------+------------------------------+
| success   | fail    | success      | success | retry or finish with         |
|           |         |              |         | EIO(may offline disk)        |
+-----------+---------+--------------+---------+------------------------------+
| fail      | NA      | success      | success | retry or finish with         |
|           |         |              |         | EIO(may offline disk)        |
+-----------+---------+--------------+---------+------------------------------+
| fail      | NA      | success      | fail    | fallback to host recovery,   |
|           |         |              |         | retry or finish with EIO(may |
|           |         |              |         | offline disk)                |
+-----------+---------+--------------+---------+------------------------------+
| fail      | NA      | fail         | NA      | fallback to host  recovery,  |
|           |         |              |         | retry or finish with EIO(may |
|           |         |              |         | offline disk)                |
+-----------+---------+--------------+---------+------------------------------+

+-----------+---------+--------------+---------+------------------------------+
| lun reset | TUR     | target reset | TUR     | Desired result               |
+-----------+---------+--------------+---------+------------------------------+
| success   | success | NA           | NA      | retry or finish with         |
|           |         |              |         | EIO(may offline disk)        |
+-----------+---------+--------------+---------+------------------------------+
| success   | fail    | success      | success | lun recovery fallback to     |
|           |         |              |         | target recovery, retry or    |
|           |         |              |         | finish with EIO(may offline  |
|           |         |              |         | disk                         |
+-----------+---------+--------------+---------+------------------------------+
| fail      | NA      | success      | success | lun recovery fallback to     |
|           |         |              |         | target recovery, retry or    |
|           |         |              |         | finish with EIO(may offline  |
|           |         |              |         | disk                         |
+-----------+---------+--------------+---------+------------------------------+
| fail      | NA      | success      | fail    | lun recovery fallback to     |
|           |         |              |         | target recovery, then fall   |
|           |         |              |         | back to host recovery, retry |
|           |         |              |         | or fhinsi with EIO(may       |
|           |         |              |         | offline disk)                |
+-----------+---------+--------------+---------+------------------------------+
| fail      | NA      | fail         | NA      | lun recovery fallback to     |
|           |         |              |         | target recovery, then fall   |
|           |         |              |         | back to host recovery, retry |
|           |         |              |         | or fhinsi with EIO(may       |
|           |         |              |         | offline disk)                |
+-----------+---------+--------------+---------+------------------------------+


> Wenchao Hao (13):
>    scsi: Define basic framework for driver LUN/target based error handle
>    scsi:scsi_error: Move complete variable eh_action from shost to sdevice
>    scsi:scsi_error: Check if to do reset in scsi_try_xxx_reset
>    scsi:scsi_error: Add helper scsi_eh_sdev_stu to do START_UNIT
>    scsi:scsi_error: Add helper scsi_eh_sdev_reset to do lun reset
>    scsi:scsi_error: Add flags to mark error handle steps has done
>    scsi:scsi_error: Define helper to perform LUN based error handle
>    scsi:scsi_error: Add LUN based error handler based previous helper
>    scsi:core: increase/decrease target_busy without check can_queue
>    scsi:scsi_error: Define helper to perform target based error handle
>    scsi:scsi_error: Add target based error handler based previous helper
>    scsi:scsi_debug: Add param to control if setup LUN based error handle
>    scsi:scsi_debug: Add param to control if setup target based error handle
> 
>   drivers/scsi/scsi_debug.c  |  19 +
>   drivers/scsi/scsi_error.c  | 705 ++++++++++++++++++++++++++++++++++---
>   drivers/scsi/scsi_lib.c    |  23 +-
>   drivers/scsi/scsi_priv.h   |  20 ++
>   include/scsi/scsi_device.h |  97 +++++
>   include/scsi/scsi_eh.h     |   4 +
>   include/scsi/scsi_host.h   |   2 -
>   7 files changed, 813 insertions(+), 57 deletions(-)
> 

Download attachment "logs.tar.gz" of type "application/x-gzip" (7681 bytes)

View attachment "test.sh" of type "text/plain" (6362 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ