linux-kernel - Re: [PATCH v5 2/2] scsi: ufs: fix tm request correctly when non-fatal error happens

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <acf85e531f282b3d1162a4879a1160d5@codeaurora.org>
Date:   Fri, 08 Jan 2021 08:53:29 +0800
From:   Can Guo <cang@...eaurora.org>
To:     Jaegeuk Kim <jaegeuk@...nel.org>
Cc:     linux-kernel@...r.kernel.org, linux-scsi@...r.kernel.org,
        kernel-team@...roid.com, alim.akhtar@...sung.com,
        avri.altman@....com, bvanassche@....org,
        martin.petersen@...cle.com, stanley.chu@...iatek.com,
        Jaegeuk Kim <jaegeuk@...gle.com>
Subject: Re: [PATCH v5 2/2] scsi: ufs: fix tm request correctly when non-fatal
 error happens

On 2021-01-08 02:53, Jaegeuk Kim wrote:
> From: Jaegeuk Kim <jaegeuk@...gle.com>
> 
> When non-fatal error like line-reset happens, ufshcd_err_handler() 
> starts to
> abort tasks by ufshcd_try_to_abort_task(). When it tries to issue tm 
> request,
> we've hit two warnings.
> 
> WARNING: CPU: 7 PID: 7 at block/blk-core.c:630 
> blk_get_request+0x68/0x70
> WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82 
> blk_mq_get_tag+0x438/0x46c
> 
> After fixing the above warnings, I've hit another tm_cmd timeout, which 
> may be
> caused by unstable controller state.
> 
> __ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out
> 
> Then, ufshcd_err_handler() enters full reset, and I hit kernel stuck. 
> It turned
> out ufshcd_print_trs() printed too many messages in console which 
> requires CPU
> locks. Likewise hba->silence_err_logs, we need to avoid too verbose 
> messages.
> Actually it came from ufshcd_transfer_rsp_status() when requeuing 
> commands back.
> Indeed, this is actually not an error case, so let's fix it.
> 
> Fixes: 69a6c269c097 ("scsi: ufs: Use blk_{get,put}_request() to
> allocate and free TMFs")
> Signed-off-by: Jaegeuk Kim <jaegeuk@...nel.org>

It is really good to find out the root cause! Thanks for the fix.

Reviewed-by: Can Guo <cang@...eaurora.org>

> ---
>  drivers/scsi/ufs/ufshcd.c | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index e6e7bdf99cd7..2a715f13fe1d 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -4996,7 +4996,8 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba,
> struct ufshcd_lrb *lrbp)
>  		break;
>  	} /* end of switch */
> 
> -	if ((host_byte(result) != DID_OK) && !hba->silence_err_logs)
> +	if ((host_byte(result) != DID_OK) &&
> +	    (host_byte(result) != DID_REQUEUE) && !hba->silence_err_logs)
>  		ufshcd_print_trs(hba, 1 << lrbp->task_tag, true);
>  	return result;
>  }
> @@ -6302,9 +6303,13 @@ static irqreturn_t ufshcd_intr(int irq, void 
> *__hba)
>  		intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
>  	}
> 
> -	if (enabled_intr_status && retval == IRQ_NONE) {
> -		dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x\n",
> -					__func__, intr_status);
> +	if (enabled_intr_status && retval == IRQ_NONE &&
> +				!ufshcd_eh_in_progress(hba)) {
> +		dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x (0x%08x, 
> 0x%08x)\n",
> +					__func__,
> +					intr_status,
> +					hba->ufs_stats.last_intr_status,
> +					enabled_intr_status);
>  		ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, "host_regs: ");
>  	}
> 
> @@ -6348,7 +6353,10 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba 
> *hba,
>  	 * Even though we use wait_event() which sleeps indefinitely,
>  	 * the maximum wait time is bounded by %TM_CMD_TIMEOUT.
>  	 */
> -	req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED);
> +	req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
> +	if (IS_ERR(req))
> +		return PTR_ERR(req);
> +
>  	req->end_io_data = &wait;
>  	free_slot = req->tag;
>  	WARN_ON_ONCE(free_slot < 0 || free_slot >= hba->nutmrs);