lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 29 Jun 2021 14:23:28 +0800
From:   Can Guo <cang@...eaurora.org>
To:     Bart Van Assche <bvanassche@....org>
Cc:     Adrian Hunter <adrian.hunter@...el.com>, asutoshd@...eaurora.org,
        nguyenb@...eaurora.org, hongwus@...eaurora.org,
        ziqichen@...eaurora.org, linux-scsi@...r.kernel.org,
        kernel-team@...roid.com, Alim Akhtar <alim.akhtar@...sung.com>,
        Avri Altman <avri.altman@....com>,
        "James E.J. Bottomley" <jejb@...ux.ibm.com>,
        "Martin K. Petersen" <martin.petersen@...cle.com>,
        Stanley Chu <stanley.chu@...iatek.com>,
        Bean Huo <beanhuo@...ron.com>,
        Jaegeuk Kim <jaegeuk@...nel.org>,
        open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4 06/10] scsi: ufs: Remove host_sem used in
 suspend/resume

On 2021-06-29 01:31, Bart Van Assche wrote:
> On 6/28/21 1:17 AM, Can Guo wrote:
>> On 2021-06-25 01:11, Bart Van Assche wrote:
>>> On 6/23/21 11:31 PM, Can Guo wrote:
>>>> Using back host_sem in suspend_prepare()/resume_complete() won't 
>>>> have
>>>> this problem of deadlock, right?
>>> 
>>> Although that would solve the deadlock discussed in this email 
>>> thread, it
>>> wouldn't solve the issue of potential adverse interactions of the UFS
>>> error handler and the SCSI error handler running concurrently.
>> 
>> I think I've explained it before, paste it here -
>> 
>> ufshcd_eh_host_reset_handler() invokes ufshcd_err_handler() and 
>> flushes it,
>> so SCSI error handler and UFS error handler can safely run together.
> 
> That code path is the exception. Do you agree that the following three
> functions all invoke the ufshcd_err_handler() function asynchronously?
> * ufshcd_uic_pwr_ctrl()
> * ufshcd_check_errors()
> * ufshcd_abort()
> 

I agree, but I don't see what's wrong with that. Any context can invoke
ufs error handler asynchronously and ufs error handler prepare makes
sure error handler can work safely, i.e., stopping PM ops/gating/scaling
in error handler prepare makes sure no one shall call 
ufshcd_uic_pwr_ctrl()
ever again. And ufshcd_check_errors() and ufshcd_abort() are OK to run
concurrently with UFS error handler.

>>> How about using the
>>> standard approach for invoking the UFS error handler instead of using
>>> a custom
>>> mechanism, e.g. by using something like the (untested) patch below? 
>>> This
>>> approach guarantees that the UFS error handler is only activated 
>>> after
>>> all
>>> pending SCSI commands have failed or timed out and also guarantees
>>> that no new
>>> SCSI commands will be queued while the UFS error handler is in
>>> progress (see
>>> also scsi_host_queue_ready()).
>> 
>> Per my understanding, SCSI error handling is scsi cmd based, meaning 
>> it
>> only works when certain SCSI cmds failed [ ... ]
> That is not completely correct. The SCSI error handler is activated if
> either all pending commands have failed or if it is scheduled
> explicitly. Please take a look at the host_eh_scheduled member 
> variable,
> how it is used and also at scsi_schedule_eh(). The scsi_schedule_eh()
> function was introduced in 2006 and that the ATA code uses it since 
> then
> to activate the SCSI error handler even if no commands are pending. See
> also the patch "SCSI: make scsi_implement_eh() generic API for SCSI
> transports".
> 
>> However, most UFS (UIC) errors happens during gear scaling, clk gating
>> and suspend/resume (due to power mode changes and/or hibern8
>> enter/exit), during which there is NO scsi cmds in UFS driver at all
>> (because these contexts start only when there is no ongoing data
>> transactions).
> 
> Activating the SCSI error handler if no SCSI commands are in progress 
> is
> supported by scsi_schedule_eh().
> 
>> Thus, scsi_unjam_host() won't even call scsi_eh_ready_devs() because
>> scsi_eh_get_sense() always returns TRUE in these cases (eh_work_q is
>> empty).
> 
> Please take another look at the patch in my previous message. There is 
> a
> scsi_transport_template instance in that patch. The eh_strategy_handler
> defined in a SCSI transport template is called *instead* of
> scsi_unjam_host(). In other words, scsi_unjam_host() won't be called if
> my patch would be applied to the UFS driver.
> 
> Please let me know if you need more information.

Sorry that I missed the change of scsi_transport_template() in your 
previous
message. I can understand that you want to invoke UFS error hander by 
invoking
SCSI error handler, but I didn't go that far because I saw you changed
pm_runtime_get_sync() to pm_runtime_get_noresume() in ufs error handler 
prepare.
How can that change make sure that the device is not suspending or 
resuming
while error handler is running?

Thanks,

Can Guo.

> 
> Bart.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ