linux-kernel - Re: [PATCH v4 06/10] scsi: ufs: Remove host

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ecf749fc88220910563704ef41939d40@codeaurora.org>
Date:   Wed, 30 Jun 2021 05:50:46 +0800
From:   Can Guo <cang@...eaurora.org>
To:     Bart Van Assche <bvanassche@....org>
Cc:     Adrian Hunter <adrian.hunter@...el.com>, asutoshd@...eaurora.org,
        nguyenb@...eaurora.org, hongwus@...eaurora.org,
        ziqichen@...eaurora.org, linux-scsi@...r.kernel.org,
        kernel-team@...roid.com, Alim Akhtar <alim.akhtar@...sung.com>,
        Avri Altman <avri.altman@....com>,
        "James E.J. Bottomley" <jejb@...ux.ibm.com>,
        "Martin K. Petersen" <martin.petersen@...cle.com>,
        Stanley Chu <stanley.chu@...iatek.com>,
        Bean Huo <beanhuo@...ron.com>,
        Jaegeuk Kim <jaegeuk@...nel.org>,
        open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4 06/10] scsi: ufs: Remove host_sem used in
 suspend/resume

On 2021-06-30 02:01, Bart Van Assche wrote:
> On 6/28/21 11:23 PM, Can Guo wrote:
>> On 2021-06-29 01:31, Bart Van Assche wrote:
>>> On 6/28/21 1:17 AM, Can Guo wrote:
>>>> On 2021-06-25 01:11, Bart Van Assche wrote:
>>>>> On 6/23/21 11:31 PM, Can Guo wrote:
>>>>>> Using back host_sem in suspend_prepare()/resume_complete()
>>>>>> won't have this problem of deadlock, right?
>>>>> 
>>>>> Although that would solve the deadlock discussed in this email
>>>>> thread, it wouldn't solve the issue of potential adverse
>>>>> interactions of the UFS error handler and the SCSI error
>>>>> handler running concurrently.
>>>> 
>>>> I think I've explained it before, paste it here -
>>>> 
>>>> ufshcd_eh_host_reset_handler() invokes ufshcd_err_handler() and
>>>> flushes it, so SCSI error handler and UFS error handler can
>>>> safely run together.
>>> 
>>> That code path is the exception. Do you agree that the following
>>> three functions all invoke the ufshcd_err_handler() function
>>> asynchronously? * ufshcd_uic_pwr_ctrl() * ufshcd_check_errors() *
>>> ufshcd_abort()
>> 
>> I agree, but I don't see what's wrong with that. Any context can
>> invoke ufs error handler asynchronously and ufs error handler prepare
>> makes sure error handler can work safely, i.e., stopping PM
>> ops/gating/scaling in error handler prepare makes sure no one shall
>> call ufshcd_uic_pwr_ctrl() ever again. And ufshcd_check_errors() and
>> ufshcd_abort() are OK to run concurrently with UFS error handler.
> 
> The current UFS error handling approach requires the following code in
> ufshcd_queuecommand():
> 
> 		if (hba->pm_op_in_progress) {
> 			hba->force_reset = true;
> 			set_host_byte(cmd, DID_BAD_TARGET);
> 			cmd->scsi_done(cmd);
> 			goto out;
> 		}
> 
> Removing that code is not possible with the current error handling
> approach. My patch makes it possible to remove that code.
> 
>> Sorry that I missed the change of scsi_transport_template() in your
>> previous message. I can understand that you want to invoke UFS error
>> hander by invoking SCSI error handler, but I didn't go that far
>> because I saw you changed pm_runtime_get_sync() to
>> pm_runtime_get_noresume() in ufs error handler prepare. How can that
>> change make sure that the device is not suspending or resuming while
>> error handler is running?
> 
> UFS power state transitions happen by submitting a SCSI command to a
> WLUN. The SCSI error handler is only activated after all outstanding
> SCSI commands for a SCSI host have failed or completed. I think this
> guarantees for the UFS driver that eh_strategy_handler is not invoked
> while a command submitted to a WLUN is changing the power state of the
> UFS device. The following code from scsi_error.c only wakes up the 
> error
> handler if (shost->host_failed || shost->host_eh_scheduled) &&
> shost->host_failed == scsi_host_busy(shost):
> 
> 	if ((shost->host_failed == 0 && shost->host_eh_scheduled == 0)
> 	    || shost->host_failed != scsi_host_busy(shost)) {
> 		schedule();
> 		continue;
> 	}
> 	/* Handle SCSI errors */
> 

It is not completely right - wl_suspend/resume() are much more twisted.

wl_suspend() may or may NOT send a SCSI cmd to WLUN, i.e., SSU cmd may
be skipped if spm/rpm_lvl is 0/1 and/or if bkops/wb is on-going (even 
when
rpm_lvl is not 0/1), while link can still be put to hibern8/off, then
power/clks can still be shutdown to save power.

wl_resume(), in case of rpm/spm_lvl == 5, does a full reset to UFS 
device,
without sending a SSU cmd to WLU to complete the power state transition.

So above checks (in scsi_error_handler()) cannot gaurantee that actual
power state transistions in UFS driver has ceased before start UFS error
handling.

Thanks,

Can Guo.

> Thanks,
> 
> Bart.