lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a9d09c59-5e5c-c271-345e-b4349968bb0b@suse.de>
Date:   Wed, 24 Aug 2016 16:21:38 +0200
From:   Hannes Reinecke <hare@...e.de>
To:     John Garry <john.garry@...wei.com>, jejb@...ux.vnet.ibm.com,
        martin.petersen@...cle.com
Cc:     linuxarm@...wei.com, zhangfei.gao@...aro.org, xuwei5@...ilicon.com,
        john.garry2@...l.dcu.ie, linux-scsi@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 5/8] hisi_sas: add v2 hw slot complete internal abort
 support

On 08/24/2016 04:07 PM, John Garry wrote:
> On 24/08/2016 13:59, Hannes Reinecke wrote:
>> On 08/24/2016 01:05 PM, John Garry wrote:
>>> Add code in slot_complete_v2_hw() to deal with the
>>> slots which have completed due to internal abort.
>>>
>>> The status codes have the following meaning:
>>> - STAT_IO_ABORTED: the IO has been aborted due to
>>> internal abort, whether by device or individual
>>> abort command
>>> - STAT_IO_COMPLETE: internal abort command has
>>> completed successfully for device or individual
>>> abort command
>>> - STAT_IO_NO_DEVICE: internal abort command has
>>> completed for device but cannot find any IO
>>> - STAT_IO_NOT_VALID: internal abort command has
>>> completed for single command but could not
>>> find the command
>>>
>>> Signed-off-by: John Garry <john.garry@...wei.com>
>>> ---
>>>  drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 31
>>> +++++++++++++++++++++++++++++++
>>>  1 file changed, 31 insertions(+)
>>>
>>> diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
>>> b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
>>> index fec1675..bf9b693 100644
>>> --- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
>>> +++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
>>> @@ -227,6 +227,13 @@
>>>  #define CMPLT_HDR_RSPNS_XFRD_MSK    (0x1 << CMPLT_HDR_RSPNS_XFRD_OFF)
>>>  #define CMPLT_HDR_ERX_OFF        12
>>>  #define CMPLT_HDR_ERX_MSK        (0x1 << CMPLT_HDR_ERX_OFF)
>>> +#define CMPLT_HDR_ABORT_STAT_OFF    13
>>> +#define CMPLT_HDR_ABORT_STAT_MSK    (0x7 << CMPLT_HDR_ABORT_STAT_OFF)
>>> +/* abort_stat */
>>> +#define STAT_IO_NOT_VALID        0x1
>>> +#define STAT_IO_NO_DEVICE        0x2
>>> +#define STAT_IO_COMPLETE        0x3
>>> +#define STAT_IO_ABORTED            0x4
>>>  /* dw1 */
>>>  #define CMPLT_HDR_IPTT_OFF        0
>>>  #define CMPLT_HDR_IPTT_MSK        (0xffff << CMPLT_HDR_IPTT_OFF)
>>> @@ -1569,6 +1576,30 @@ slot_complete_v2_hw(struct hisi_hba *hisi_hba,
>>> struct hisi_sas_slot *slot,
>>>          goto out;
>>>      }
>>>
>>> +    /* Use SAS+TMF status codes */
>>> +    switch ((complete_hdr->dw0 & CMPLT_HDR_ABORT_STAT_MSK)
>>> +            >> CMPLT_HDR_ABORT_STAT_OFF) {
>>> +    case STAT_IO_ABORTED:
>>> +        /* this io has been aborted by abort command */
>>> +        ts->stat = SAS_ABORTED_TASK;
>>> +        goto out;
>>> +    case STAT_IO_COMPLETE:
>>> +        /* internal abort command complete */
>>> +        ts->stat = TMF_RESP_FUNC_COMPLETE;
>>> +        goto out;
>>> +    case STAT_IO_NO_DEVICE:
>>> +        ts->stat = TMF_RESP_FUNC_COMPLETE;
>>> +        goto out;
>>> +    case STAT_IO_NOT_VALID:
>>> +        /* abort single io, controller don't find
>>> +         * the io need to abort
>>> +         */
>>> +        ts->stat = TMF_RESP_FUNC_FAILED;
>>> +        goto out;
>> Hmm. This will cause the SCSI EH to kick in.
>> And then, according to the description abort has succeeded, it's just
>> that for some reason the associated command couldn't be found.
>> So couldn't this be due to a race condition, and the command has in fact
>> been aborted correctly (and the code is just too slow acknowledging it)?
>>
> 
> Hi Hannes,
> 
> I'm not sure I fully get your question.
> 
> The internal abort would happen from the SCSI error handling. An example
> would be when the disk was not safely removed and some IO is still in
> flight. In this case the IO will timeout, SCSI EH starts, and we try to
> abort the command in LLDD, by TMF (which would fail) and internal abort.
> 
> For internal abort, if the abort command succeeds then 2 things happen:
> - abort task completes with status STAT_IO_COMPLETE
> - task which was aborted completes with status STAT_IO_ABORTED
> 
> If the command does not abort successfully then:
> - abort task completes with status STAT_IO_NOT_VALID
> - task which we wanted to be aborted does not complete and is probably
> still in the slave device
> 
> I hope that this makes it clear.
> 
Right, that answers it.

Reviewed-by: Hannes Reinecke <hare@...e.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@...e.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ