linux-kernel - [REQUEST DISCUSS]: speed up SCSI error handle for host with massive devices

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <71e09bb4-ff0a-23fe-38b4-fe6425670efa@huawei.com>
Date:   Tue, 29 Mar 2022 17:06:30 +0800
From:   Wenchao Hao <haowenchao@...wei.com>
To:     <linux-scsi@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "James E.J. Bottomley" <jejb@...ux.ibm.com>,
        "Martin K. Petersen" <martin.petersen@...cle.com>,
        Mike Christie <michael.christie@...cle.com>,
        Lee Duncan <lduncan@...e.com>
CC:     Wu Bo <wubo40@...wei.com>, Feilong Lin <linfeilong@...wei.com>,
        <zhangjian013@...wei.com>
Subject: [REQUEST DISCUSS]: speed up SCSI error handle for host with massive
 devices

SCSI timeout would call scsi_eh_scmd_add() on some conditions, host 
would be set
to SHOST_RECOVERY state. Once host enter SHOST_RECOVERY, IOs submitted 
to all
devices in this host would not succeed until the scsi_error_handler() 
finished.
The scsi_error_handler() might takes long time to be done, it's 
unbearable when
host has massive devices.

I want to ask is anyone applying another error handler flow to address this
phenomenon?

I think we can move some operations(like scsi get sense, scsi send startunit
and scsi device reset) out of scsi_unjam_host(), to perform these operations
without setting host to SHOST_RECOVERY? It would reduce the time of 
block the
whole host.

Waiting for your discussion.