[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c3cb7228-254e-9584-182b-007ac5e6fe0a@huawei.com>
Date: Fri, 4 Feb 2022 11:50:43 +0000
From: John Garry <john.garry@...wei.com>
To: Damien Le Moal <damien.lemoal@...nsource.wdc.com>,
<jejb@...ux.ibm.com>, <martin.petersen@...cle.com>,
<artur.paszkiewicz@...el.com>, <jinpu.wang@...ud.ionos.com>,
<chenxiang66@...ilicon.com>, <Ajish.Koshy@...rochip.com>
CC: <yanaijie@...wei.com>, <linux-doc@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <linux-scsi@...r.kernel.org>,
<linuxarm@...wei.com>, <liuqi115@...wei.com>,
<Viswas.G@...rochip.com>
Subject: Re: [PATCH 00/16] scsi: libsas and users: Factor out LLDD TMF code
On 04/02/2022 11:27, Damien Le Moal wrote:
>> As I mentioned, having this fail is a red flag. If I was pushed to guess
>> what has happened, I'd say the FW is faulting due to some erroneous
>> driver behaviour.
> I am still thinking that there is something wrong with the handling of
> NCQ NON DATA command. There are several places where the code determines
> non-data vs pio vs dma vs fpdma (ncq), and NCQ NON DATA always falls in
> the fpdma bucket, which is wrong.
>
Ok, I will have a look at this. We made some libsas changes related to
this not so long ago to "fix" something, see:
53de092f47f ("scsi: libsas: Set data_dir as DMA_NONE if libata marks qc
as NODATA")
176ddd89171d ("scsi: libsas: Reset num_scatter if libata marks qc as
NODATA")
>>> This is the submission path, not completion. The code is:
>>>
>>> (gdb) list *(pm8001_queue_command+0x842)
>>> 0x3d42 is in pm8001_queue_command (drivers/scsi/pm8001/pm8001_sas.c:491).
>>> 486 atomic_dec(&pm8001_dev->running_req);
>>> 487 goto err_out_tag;
>>> 488 }
>>> 489 /* TODO: select normal or high priority */
>>> 490 spin_lock(&t->task_state_lock);
>>> 491 t->task_state_flags |= SAS_TASK_AT_INITIATOR;
>>> 492 spin_unlock(&t->task_state_lock);
>>> 493 } while (0);
>>> 494 rc = 0;
>>> 495 goto out_done;
>>>
>>> So the task is already completed when the submission path tries to set
>>> the state flag ? Debugging...
>> Yeah, that's how it looks.
>>
>> I already mentioned this problem here:
>>
>> https://lore.kernel.org/linux-scsi/0cc0c435-b4f2-9c76-258d-865ba50a29dd@huawei.com/
>>
>> Maybe we should just fix it now to rule it out of possibly causing other
>> issues... I was reluctant to fix it as many places seems to need to be
>> touched. Let me check it.
> Here is my current fix:
>
> diff --git a/drivers/scsi/pm8001/pm8001_sas.c
> b/drivers/scsi/pm8001/pm8001_sas.c
> index 1b95c73d12d1..16c101577dd3 100644
> --- a/drivers/scsi/pm8001/pm8001_sas.c
> +++ b/drivers/scsi/pm8001/pm8001_sas.c
> @@ -453,13 +453,18 @@ int pm8001_queue_command(struct sas_task *task,
> gfp_t gfp_flags)
> ccb->ccb_tag = tag;
> ccb->task = t;
> ccb->device = pm8001_dev;
> +
> + /* TODO: select normal or high priority */
> + atomic_inc(&pm8001_dev->running_req);
> + spin_lock(&t->task_state_lock);
> + t->task_state_flags |= SAS_TASK_AT_INITIATOR;
> + spin_unlock(&t->task_state_lock);
> +
> switch (task_proto) {
> case SAS_PROTOCOL_SMP:
> - atomic_inc(&pm8001_dev->running_req);
> rc = pm8001_task_prep_smp(pm8001_ha, ccb);
> break;
> case SAS_PROTOCOL_SSP:
> - atomic_inc(&pm8001_dev->running_req);
> if (is_tmf)
> rc = pm8001_task_prep_ssp_tm(pm8001_ha,
> ccb, tmf);
> @@ -468,7 +473,6 @@ int pm8001_queue_command(struct sas_task *task,
> gfp_t gfp_flags)
> break;
> case SAS_PROTOCOL_SATA:
> case SAS_PROTOCOL_STP:
> - atomic_inc(&pm8001_dev->running_req);
> rc = pm8001_task_prep_ata(pm8001_ha, ccb);
> break;
> default:
> @@ -480,13 +484,12 @@ int pm8001_queue_command(struct sas_task *task,
> gfp_t gfp_flags)
>
> if (rc) {
> pm8001_dbg(pm8001_ha, IO, "rc is %x\n", rc);
> + spin_lock(&t->task_state_lock);
> + t->task_state_flags &= ~SAS_TASK_AT_INITIATOR;
> + spin_unlock(&t->task_state_lock);
> atomic_dec(&pm8001_dev->running_req);
> goto err_out_tag;
> }
> - /* TODO: select normal or high priority */
> - spin_lock(&t->task_state_lock);
> - t->task_state_flags |= SAS_TASK_AT_INITIATOR;
> - spin_unlock(&t->task_state_lock);
> } while (0);
> rc = 0;
> goto out_done;
>
> With this, No KASAN complaint. I will send a proper patch ASAP.
I had just prepared a patch, but different, attached.
>
> Of note is that I cannot see what the flag SAS_TASK_AT_INITIATOR is for.
> It is set and unset only, never tested anywhere in libsas nor pm8001
> driver. This flag seems totally useless to me, unless this is something
> that the HW can see ?
Right, it is only checked by isci AFAIC. And it is not something which
HW can see.
Since libsas does not check it the semantics are ill-defined. However I
think that it's worth setting it completeness; I thought it was better
setting it just before the command is delivered to HW, but the
implementation touches more SW. Simpler approach may be better since it
aligns with how this flag may be cleared in pm8001_chip_sata_req() under
asusmption it is already set.
Thanks,
John
View attachment "0001-scsi-pm8001-Set-SAS_TASK_AT_INITIATOR-before-deliver.patch" of type "text/plain" (6173 bytes)
Powered by blists - more mailing lists