lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cd7bda98-2160-9271-9520-e98d1fe00ea5@linux.ibm.com>
Date:   Tue, 29 Mar 2022 12:56:53 +0200
From:   Steffen Maier <maier@...ux.ibm.com>
To:     Wenchao Hao <haowenchao@...wei.com>, linux-scsi@...r.kernel.org,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "James E.J. Bottomley" <jejb@...ux.ibm.com>,
        "Martin K. Petersen" <martin.petersen@...cle.com>,
        Mike Christie <michael.christie@...cle.com>,
        Lee Duncan <lduncan@...e.com>
Cc:     Wu Bo <wubo40@...wei.com>, Feilong Lin <linfeilong@...wei.com>,
        zhangjian013@...wei.com
Subject: Re: [REQUEST DISCUSS]: speed up SCSI error handle for host with
 massive devices

On 3/29/22 11:06, Wenchao Hao wrote:
> SCSI timeout would call scsi_eh_scmd_add() on some conditions, host would be set
> to SHOST_RECOVERY state. Once host enter SHOST_RECOVERY, IOs submitted to all
> devices in this host would not succeed until the scsi_error_handler() finished.
> The scsi_error_handler() might takes long time to be done, it's unbearable when
> host has massive devices.
> 
> I want to ask is anyone applying another error handler flow to address this
> phenomenon?
> 
> I think we can move some operations(like scsi get sense, scsi send startunit
> and scsi device reset) out of scsi_unjam_host(), to perform these operations
> without setting host to SHOST_RECOVERY? It would reduce the time of block the
> whole host.
> 
> Waiting for your discussion.

We already have "async" aborts before even entering scsi_eh. So your use case 
seems to imply that those aborts fail and we enter scsi_eh?

There's eh_deadline for limiting the time spent in escalation of scsi_eh, and 
instead directly go to host reset. Would this help?


-- 
Mit freundlichen Gruessen / Kind regards
Steffen Maier

Linux on IBM Z and LinuxONE

https://www.ibm.com/privacy/us/en/
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Gregor Pillen
Geschaeftsfuehrung: David Faller
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ