linux-kernel - Re: [PATCH 4/4] scsi: hisi_sas: Use libsas internal abort support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <380af884-94f2-231b-040b-2d89a544b8ed@huawei.com>
Date:   Mon, 25 Apr 2022 09:43:08 +0100
From:   John Garry <john.garry@...wei.com>
To:     Hannes Reinecke <hare@...e.de>, <jejb@...ux.ibm.com>,
        <martin.petersen@...cle.com>, <jinpu.wang@...ud.ionos.com>,
        <damien.lemoal@...nsource.wdc.com>
CC:     <linux-scsi@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <Ajish.Koshy@...rochip.com>, <linuxarm@...wei.com>,
        <Viswas.G@...rochip.com>, <hch@....de>, <liuqi115@...wei.com>,
        <chenxiang66@...ilicon.com>
Subject: Re: [PATCH 4/4] scsi: hisi_sas: Use libsas internal abort support

On 20/04/2022 13:29, Hannes Reinecke wrote:
> On 3/3/22 13:18, John Garry wrote:
>> Use the common libsas internal abort functionality.
>>
>> In addition, this driver has special handling for internal abort 
>> timeouts -
>> specifically whether to reset the controller in that instance, so extend
>> the API for that.
>>
> Huh? Is there a reason _not_ to reset the controller once abort times out?

There's a bug in v2 HW where the internal abort may timeout due to HW 
bug but it is not fatal, i.e. the HW state is not totally buggered, so 
can continue without a reset.

> And why isn't that delegated to SCSI EH?

For sure, SCSI EH will reset the host if all else fails. However, it may 
take some time to get to the point of deciding to reset - including lots 
of timeouts. To accelerate this, we set a host flag to say that we have 
a HW fault, and don't bother with nexus reset, LU reset, etc. once the 
initial task abort fails due to HW fault and fail straight away. Maybe 
the core code could do something similar but it seems messy/hard to 
generalise.

Thanks,
John