[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <baf17040-70e8-d850-30cd-74944e41285d@acm.org>
Date: Mon, 13 Sep 2021 09:23:07 -0700
From: Bart Van Assche <bvanassche@....org>
To: Kiwoong Kim <kwmad.kim@...sung.com>, linux-scsi@...r.kernel.org,
linux-kernel@...r.kernel.org, alim.akhtar@...sung.com,
avri.altman@....com, jejb@...ux.ibm.com,
martin.petersen@...cle.com, beanhuo@...ron.com,
cang@...eaurora.org, adrian.hunter@...el.com, sc.suh@...sung.com,
hy50.seo@...sung.com, sh425.lee@...sung.com,
bhoon95.kim@...sung.com
Subject: Re: [PATCH v2 3/3] scsi: ufs: ufs-exynos: implement exynos isr
On 9/13/21 12:55 AM, Kiwoong Kim wrote:
> This patch is to raise recovery in some abnormal
> conditions using an vendor specific interrupt
> for some cases, such as a situation that some
> contexts of a pending request in the host isn't
> the same with those of its corresponding UPIUs
> if they should have been the same exactly.
>
> The representative case is shown like below.
> In the case, a broken UTRD entry, for internal
> coherent problem or whatever, that had smaller value
> of PRDT length than expected was transferred to the host.
> So, the host raised an interrupt of transfer complete
> even if device didn't finish its data transfer because
> the host sees a fetched version of UTRD to determine
> if data tranfer is over or not. Then the application level
> seemed to recogize this as a sort of corruption and this
> symptom led to boot failure.
How can a UTRD entry be broken? Does that perhaps indicate memory
corruption at the host side? Working around host-side memory
corruption in a driver seems wrong to me. I think the root cause
of the memory corruption should be fixed.
> +static irqreturn_t exynos_ufs_isr(struct ufs_hba *hba)
> +{
> + struct exynos_ufs *ufs = ufshcd_get_variant(hba);
> + u32 status;
> + unsigned long flags;
> +
> + if (!hba->priv) return IRQ_HANDLED;
Please verify patches with checkpatch before posting these on the
linux-scsi mailing list. The above if-statement does not follow the
Linux kernel coding style.
> + if (status & RX_UPIU_HIT_ERROR) {
> + pr_err("%s: status: 0x%08x\n", __func__, status);
> + hba->force_reset = true;
> + hba->force_requeue = true;
> + scsi_schedule_eh(hba->host);
> + spin_unlock_irqrestore(hba->host->host_lock, flags);
> + return IRQ_HANDLED;
> + }
> + return IRQ_NONE;
> +}
So the above code unlocks the host_lock depending on whether or not
status & RX_UPIU_HIT_ERROR is true? Yikes ...
Additionally, in the above code I found the following pattern:
unsigned long flags;
[ ... ]
spin_unlock_irqrestore(hba->host->host_lock, flags);
Such code is ALWAYS wrong. The value of the 'flags' argument passed to
spin_unlock_irqrestore() must come from spin_lock_irqsave().
Bart.
Powered by blists - more mailing lists