[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <75527f0ba5d315d6edbf800a2ddcf8c7@codeaurora.org>
Date: Sun, 13 Jun 2021 22:42:55 +0800
From: Can Guo <cang@...eaurora.org>
To: Bart Van Assche <bvanassche@....org>
Cc: asutoshd@...eaurora.org, nguyenb@...eaurora.org,
hongwus@...eaurora.org, ziqichen@...eaurora.org,
linux-scsi@...r.kernel.org, kernel-team@...roid.com,
Alim Akhtar <alim.akhtar@...sung.com>,
Avri Altman <avri.altman@....com>,
"James E.J. Bottomley" <jejb@...ux.ibm.com>,
"Martin K. Petersen" <martin.petersen@...cle.com>,
Stanley Chu <stanley.chu@...iatek.com>,
Bean Huo <beanhuo@...ron.com>,
Jaegeuk Kim <jaegeuk@...nel.org>,
open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 8/9] scsi: ufs: Update the fast abort path in
ufshcd_abort() for PM requests
Hi Bart,
On 2021-06-13 00:50, Bart Van Assche wrote:
> On 6/12/21 12:07 AM, Can Guo wrote:
>> Sigh... I also want my life and work to be easier...
>
> How about reducing the number of states and state transitions in the
> UFS
> driver? One source of complexity is that ufshcd_err_handler() is
> scheduled
> independently of the SCSI error handler and hence may run concurrently
> with the SCSI error handler. Has the following already been considered?
> - Call ufshcd_err_handler() synchronously from ufshcd_abort() and
> ufshcd_eh_host_reset_handler() instead of asynchronously.
1. ufshcd_eh_host_reset_handler() invokes ufshcd_err_handler() and
flushes
it, so it is synchronous. ufshcd_eh_host_reset_handler() used to call
reset_and_restore() directly, which can run concurrently with UFS error
handler,
so I fixed it last year [1].
2. ufshcd_abort() invokes ufshcd_err_handler() synchronously can have a
live lock issue, which is why I chose the asynchronous way (from the
first
day I started to fix error handling). The live lock happens when abort
happens
to a PM request, e.g., a SSU cmd sent from suspend/resume. Because UFS
error
handler is synchronized with suspend/resume (by calling
pm_runtime_get_sync()
and lock_system_sleep()), the sequence is like:
[1] ufshcd_wl_resume() sends SSU cmd
[2] ufshcd_abort() calls UFS error handler
[3] UFS error handler calls lock_system_sleep() and
pm_runtime_get_sync()
In above sequence, either lock_system_sleep() or pm_runtime_get_sync()
shall
be blocked - [3] is blocked by [1], [2] is blocked by [3], while [1] is
blocked by [2].
For PM requests, I chose to abort them fast to unblock suspend/resume,
suspend/resume shall fail of course, but UFS error handler recovers
PM errors anyways.
> - Call scsi_schedule_eh() from ufshcd_uic_pwr_ctrl() and
> ufshcd_check_errors() instead of ufshcd_schedule_eh_work().
When ufshcd_uic_pwr_ctrl() and/or ufshcd_check_errors() report errors,
usually they are fatal errors, according to UFSHCI spec, SW should
re-probe
UFS to recover.
However scsi_schedule_eh() does more than that - scsi_unjam_host() sends
request sense cmd and calls scsi_eh_ready_devs(), while
scsi_eh_ready_devs()
sends test unit ready cmd and calls all the way down to
scsi_eh_device/target/
bus/host_reset(). But we only need scsi_eh_host_reset() in this case. I
know
you have concerns that scsi_schedule_eh() may run concurrently with UFS
error
handler, but as I mentioned above in [1] - I've made
ufshcd_eh_host_reset_handler()
synchronized with UFS error handler, hope that can ease your concern.
I am not saying your idea won't work, it is a good suggestion. I will
try
it after these changes go in, because it would require extra effort and
the
effort won't be minor - I need to consider how to remove/reduce the
ufshcd
states along with the change and the error injection and stability test
all
over again, which is a long way to go. As for now, at least current
changes
works well as per my test and we really need these changes for
Andriod12-5.10.
Thanks,
Can Guo.
>
> These changes will guarantee that all commands have completed or timed
> out before ufshcd_err_handler() is called. I think that would allow to
> remove e.g. the following code from the error handler:
>
> ufshcd_scsi_block_requests(hba);
> /* Drain ufshcd_queuecommand() */
> down_write(&hba->clk_scaling_lock);
> up_write(&hba->clk_scaling_lock);
>
> Thanks,
>
> Bart.
Powered by blists - more mailing lists