[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <645c0e3c83c8917a8fd5c0493c5815a0@codeaurora.org>
Date: Sat, 12 Jun 2021 14:46:20 +0800
From: Can Guo <cang@...eaurora.org>
To: Bart Van Assche <bvanassche@....org>
Cc: Adrian Hunter <adrian.hunter@...el.com>, asutoshd@...eaurora.org,
nguyenb@...eaurora.org, hongwus@...eaurora.org,
ziqichen@...eaurora.org, linux-scsi@...r.kernel.org,
kernel-team@...roid.com, Alim Akhtar <alim.akhtar@...sung.com>,
Avri Altman <avri.altman@....com>,
"James E.J. Bottomley" <jejb@...ux.ibm.com>,
"Martin K. Petersen" <martin.petersen@...cle.com>,
Stanley Chu <stanley.chu@...iatek.com>,
Bean Huo <beanhuo@...ron.com>,
Jaegeuk Kim <jaegeuk@...nel.org>,
open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 5/9] scsi: ufs: Simplify error handling preparation
On 2021-06-12 04:58, Bart Van Assche wrote:
> On 6/10/21 8:01 PM, Can Guo wrote:
>> Previously, without commit cb7e6f05fce67c965194ac04467e1ba7bc70b069,
>> ufshcd_resume() may turn off pwr and clk due to UFS error, e.g., link
>> transition failure and SSU error/abort (and these UFS error would
>> invoke error handling). When error handling kicks start, it should
>> re-enable the pwr and clk before proceeding. Now, commit
>> cb7e6f05fce67c965194ac04467e1ba7bc70b069 makes ufshcd_resume()
>> purely control pwr and clk, meaning if ufshcd_resume() fails, there
>> is nothing we can do about it - pwr or clk enabling must have failed,
>> and it is not because of UFS error. This is why I am removing the
>> re-enabling pwr/clk in error handling prepare.
>
> Why are link transition failures handled in the error handler instead
> of
> in the context where these errors are detected (ufshcd_resume())? Is it
> even possible to recover from a link transition failure or does this
> perhaps indicate a broken UFS controller?
Basically, almost all UFS failures are caused by errors in underlaying
layers,
i.e., UIC errors, including link transition failures. And according to
UFSHCI
spec, SW should do a full reset to recover it, just like handle any
other
fatal UIC errors. All UIC errors are detected by HW and reported by IRQ
handler.
UFSHCI Spec Ver. 31
8.2.7 Hibernate Enter/Exit Error Handling
Hibernate Enter/Exit Error occurs when the UniPro link is broken. When
this condition occurs,
host software should reset the host controller by setting register HCE
to ‘0’, re-initialize the host
controller by setting register HCE to ‘1', and then start link startup
sequence as shown in Figure 16.
>
>>> but what I really wonder is why we don't just do recovery directly
>>> in __ufshcd_wl_suspend() and __ufshcd_wl_resume() and strip all
>>> the PM complexity out of ufshcd_err_handling()?
>
> +1
I've explained why I chose not to do this in my last reply to Adrian.
Please kindly check it.
>
>> For system suspend/resume, since error handling has the same nature
>> like user access, so we are using host_sem to avoid concurrency of
>> error handling and system suspend/resume.
>
> Why is host_sem used for that purpose instead of lock_system_sleep()
> and
> unlock_system_sleep()?
>
I was aware of it, but the situation is that host_sem is also used to
avoid concurrency among user access, error handling and shutdown, so
I think just use host_sem anyways to simply the lockings, otherwise
user access and error handling would have to take both
system_transition_mutex
and host_sem
Thanks,
Can Guo.
> Thanks,
>
> Bart.
Powered by blists - more mailing lists