lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <380af884-94f2-231b-040b-2d89a544b8ed@huawei.com>
Date:   Mon, 25 Apr 2022 09:43:08 +0100
From:   John Garry <john.garry@...wei.com>
To:     Hannes Reinecke <hare@...e.de>, <jejb@...ux.ibm.com>,
        <martin.petersen@...cle.com>, <jinpu.wang@...ud.ionos.com>,
        <damien.lemoal@...nsource.wdc.com>
CC:     <linux-scsi@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <Ajish.Koshy@...rochip.com>, <linuxarm@...wei.com>,
        <Viswas.G@...rochip.com>, <hch@....de>, <liuqi115@...wei.com>,
        <chenxiang66@...ilicon.com>
Subject: Re: [PATCH 4/4] scsi: hisi_sas: Use libsas internal abort support

On 20/04/2022 13:29, Hannes Reinecke wrote:
> On 3/3/22 13:18, John Garry wrote:
>> Use the common libsas internal abort functionality.
>>
>> In addition, this driver has special handling for internal abort 
>> timeouts -
>> specifically whether to reset the controller in that instance, so extend
>> the API for that.
>>
> Huh? Is there a reason _not_ to reset the controller once abort times out?

There's a bug in v2 HW where the internal abort may timeout due to HW 
bug but it is not fatal, i.e. the HW state is not totally buggered, so 
can continue without a reset.

> And why isn't that delegated to SCSI EH?

For sure, SCSI EH will reset the host if all else fails. However, it may 
take some time to get to the point of deciding to reset - including lots 
of timeouts. To accelerate this, we set a host flag to say that we have 
a HW fault, and don't bother with nexus reset, LU reset, etc. once the 
initial task abort fails due to HW fault and fail straight away. Maybe 
the core code could do something similar but it seems messy/hard to 
generalise.

Thanks,
John

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ