lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 7 Oct 2021 14:47:11 -0700
From:   Bart Van Assche <bvanassche@....org>
To:     Kiwoong Kim <kwmad.kim@...sung.com>, linux-scsi@...r.kernel.org,
        linux-kernel@...r.kernel.org, alim.akhtar@...sung.com,
        avri.altman@....com, jejb@...ux.ibm.com,
        martin.petersen@...cle.com, beanhuo@...ron.com,
        cang@...eaurora.org, adrian.hunter@...el.com, sc.suh@...sung.com,
        hy50.seo@...sung.com, sh425.lee@...sung.com,
        bhoon95.kim@...sung.com, vkumar.1997@...sung.com
Subject: Re: [PATCH v1] scsi: ufs: clear doorbell for hibern8 errors when
 using ah8

On 10/7/21 12:40 AM, Kiwoong Kim wrote:
> When an scsi command is dispatched right after host complete
> all the pending requests goes to idle and ufs driver tries
> to ring a doorbell, host might be still during entering into
> hibern8. If an error occurrs during that period, the doorbell
> might not be zero. In this case, clearing it should be needed
> to requeue its command.
> Currently, ufshcd_err_handler goes directly to reset when the
> driver's link state is broken. This patch is to make it clear
> doorbells in the case. In the situation, communication would
> be disabled, so TM isn't necessary or we can say it's not
> available.

The above text is too hard to comprehend. Please make it more clear. As 
an example, in the first sentence, what does "goes to idle" apply to? 
Does it apply to "SCSI command", "pending requests" or something else?

What mechanism does "If an error occurs" refer to? A SCSI command 
processing error, a link error or another type of error?

> Here's an actual symptom that I've faced. At the time, tag #17
> is still pended even after host reset. And then the block timer
> is expired.

pended -> pending.

> exynos-ufs 11100000.ufs: ufshcd_check_errors: Auto Hibern8
> Enter failed - status: 0x00000040, upmcrs: 0x00000001
> ..
> host_regs: 00000050: b8671000 00000008 00020000 00000000
> ..
> exynos-ufs 11100000.ufs: ufshcd_abort: Device abort task at tag 17

The relationship between the example and the description above the 
example is not clear. If a command is pending before the error handler 
starts, aborting that command fails and the host is not reset then the 
command will still be pending after the error handler has finished.

> @@ -6138,7 +6139,12 @@ static void ufshcd_err_handler(struct Scsi_Host *host)
>   	     (hba->saved_uic_err & (UFSHCD_UIC_DL_NAC_RECEIVED_ERROR |
>   				    UFSHCD_UIC_DL_TCx_REPLAY_ERROR)))) {
>   		needs_reset = true;
> -		goto do_reset;
> +		spin_lock_irqsave(hba->host->host_lock, flags);
> +		if (!hba->ahit && ufshcd_is_link_broken(hba))
> +			link_broken_in_ah8 = true;
> +		spin_unlock_irqrestore(hba->host->host_lock, flags);
> +		if (!link_broken_in_ah8)
> +			goto do_reset;
>   	}
>   

My understanding is that hba->ahit == 0 means that auto-hibernation is 
disabled. Hence, the above code sets 'link_broken_in_ah8' only if 
auto-hibernation is disabled. So what does the name of the variable 
'link_broken_in_ah8' mean?

> @@ -6168,6 +6174,9 @@ static void ufshcd_err_handler(struct Scsi_Host *host)
>   		}
>   	}
>   
> +	if (link_broken_in_ah8)
> +		goto lock_skip_pending_xfer_clear;
> +
>   	/* Clear pending task management requests */
>   	for_each_set_bit(tag, &hba->outstanding_tasks, hba->nutmrs) {
>   		if (ufshcd_clear_tm_cmd(hba, tag)) {

Why is skipping the ufshcd_clear_tm_cmd() calls useful in this case?

Thanks,

Bart.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ