lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c3cb7228-254e-9584-182b-007ac5e6fe0a@huawei.com>
Date:   Fri, 4 Feb 2022 11:50:43 +0000
From:   John Garry <john.garry@...wei.com>
To:     Damien Le Moal <damien.lemoal@...nsource.wdc.com>,
        <jejb@...ux.ibm.com>, <martin.petersen@...cle.com>,
        <artur.paszkiewicz@...el.com>, <jinpu.wang@...ud.ionos.com>,
        <chenxiang66@...ilicon.com>, <Ajish.Koshy@...rochip.com>
CC:     <yanaijie@...wei.com>, <linux-doc@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>, <linux-scsi@...r.kernel.org>,
        <linuxarm@...wei.com>, <liuqi115@...wei.com>,
        <Viswas.G@...rochip.com>
Subject: Re: [PATCH 00/16] scsi: libsas and users: Factor out LLDD TMF code

On 04/02/2022 11:27, Damien Le Moal wrote:
>> As I mentioned, having this fail is a red flag. If I was pushed to guess
>> what has happened, I'd say the FW is faulting due to some erroneous
>> driver behaviour.
> I am still thinking that there is something wrong with the handling of
> NCQ NON DATA command. There are several places where the code determines
> non-data vs pio vs dma vs fpdma (ncq), and NCQ NON DATA always falls in
> the fpdma bucket, which is wrong.
> 

Ok, I will have a look at this. We made some libsas changes related to 
this not so long ago to "fix" something, see:

53de092f47f ("scsi: libsas: Set data_dir as DMA_NONE if libata marks qc 
as NODATA")

176ddd89171d ("scsi: libsas: Reset num_scatter if libata marks qc as 
NODATA")

>>> This is the submission path, not completion. The code is:
>>>
>>> (gdb) list *(pm8001_queue_command+0x842)
>>> 0x3d42 is in pm8001_queue_command (drivers/scsi/pm8001/pm8001_sas.c:491).
>>> 486				atomic_dec(&pm8001_dev->running_req);
>>> 487				goto err_out_tag;
>>> 488			}
>>> 489			/* TODO: select normal or high priority */
>>> 490			spin_lock(&t->task_state_lock);
>>> 491			t->task_state_flags |= SAS_TASK_AT_INITIATOR;
>>> 492			spin_unlock(&t->task_state_lock);
>>> 493		} while (0);
>>> 494		rc = 0;
>>> 495		goto out_done;
>>>
>>> So the task is already completed when the submission path tries to set
>>> the state flag ? Debugging...
>> Yeah, that's how it looks.
>>
>> I already mentioned this problem here:
>>
>> https://lore.kernel.org/linux-scsi/0cc0c435-b4f2-9c76-258d-865ba50a29dd@huawei.com/
>>
>> Maybe we should just fix it now to rule it out of possibly causing other
>> issues... I was reluctant to fix it as many places seems to need to be
>> touched. Let me check it.
> Here is my current fix:
> 
> diff --git a/drivers/scsi/pm8001/pm8001_sas.c
> b/drivers/scsi/pm8001/pm8001_sas.c
> index 1b95c73d12d1..16c101577dd3 100644
> --- a/drivers/scsi/pm8001/pm8001_sas.c
> +++ b/drivers/scsi/pm8001/pm8001_sas.c
> @@ -453,13 +453,18 @@ int pm8001_queue_command(struct sas_task *task,
> gfp_t gfp_flags)
>                  ccb->ccb_tag = tag;
>                  ccb->task = t;
>                  ccb->device = pm8001_dev;
> +
> +               /* TODO: select normal or high priority */
> +               atomic_inc(&pm8001_dev->running_req);
> +               spin_lock(&t->task_state_lock);
> +               t->task_state_flags |= SAS_TASK_AT_INITIATOR;
> +               spin_unlock(&t->task_state_lock);
> +
>                  switch (task_proto) {
>                  case SAS_PROTOCOL_SMP:
> -                       atomic_inc(&pm8001_dev->running_req);
>                          rc = pm8001_task_prep_smp(pm8001_ha, ccb);
>                          break;
>                  case SAS_PROTOCOL_SSP:
> -                       atomic_inc(&pm8001_dev->running_req);
>                          if (is_tmf)
>                                  rc = pm8001_task_prep_ssp_tm(pm8001_ha,
>                                          ccb, tmf);
> @@ -468,7 +473,6 @@ int pm8001_queue_command(struct sas_task *task,
> gfp_t gfp_flags)
>                          break;
>                  case SAS_PROTOCOL_SATA:
>                  case SAS_PROTOCOL_STP:
> -                       atomic_inc(&pm8001_dev->running_req);
>                          rc = pm8001_task_prep_ata(pm8001_ha, ccb);
>                          break;
>                  default:
> @@ -480,13 +484,12 @@ int pm8001_queue_command(struct sas_task *task,
> gfp_t gfp_flags)
> 
>                  if (rc) {
>                          pm8001_dbg(pm8001_ha, IO, "rc is %x\n", rc);
> +                       spin_lock(&t->task_state_lock);
> +                       t->task_state_flags &= ~SAS_TASK_AT_INITIATOR;
> +                       spin_unlock(&t->task_state_lock);
>                          atomic_dec(&pm8001_dev->running_req);
>                          goto err_out_tag;
>                  }
> -               /* TODO: select normal or high priority */
> -               spin_lock(&t->task_state_lock);
> -               t->task_state_flags |= SAS_TASK_AT_INITIATOR;
> -               spin_unlock(&t->task_state_lock);
>          } while (0);
>          rc = 0;
>          goto out_done;
> 
> With this, No KASAN complaint. I will send a proper patch ASAP.

I had just prepared a patch, but different, attached.

> 
> Of note is that I cannot see what the flag SAS_TASK_AT_INITIATOR is for.
> It is set and unset only, never tested anywhere in libsas nor pm8001
> driver. This flag seems totally useless to me, unless this is something
> that the HW can see ?

Right, it is only checked by isci AFAIC. And it is not something which 
HW can see.

Since libsas does not check it the semantics are ill-defined. However I 
think that it's worth setting it completeness; I thought it was better 
setting it just before the command is delivered to HW, but the 
implementation touches more SW. Simpler approach may be better since it 
aligns with how this flag may be cleared in pm8001_chip_sata_req() under 
asusmption it is already set.

Thanks,
John
View attachment "0001-scsi-pm8001-Set-SAS_TASK_AT_INITIATOR-before-deliver.patch" of type "text/plain" (6173 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ