[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d3c57c8e52f7a251b5c536a893b1f101@codeaurora.org>
Date: Sat, 12 Jun 2021 17:49:26 +0800
From: Can Guo <cang@...eaurora.org>
To: Bart Van Assche <bvanassche@....org>
Cc: Adrian Hunter <adrian.hunter@...el.com>, asutoshd@...eaurora.org,
nguyenb@...eaurora.org, hongwus@...eaurora.org,
ziqichen@...eaurora.org, linux-scsi@...r.kernel.org,
kernel-team@...roid.com, Alim Akhtar <alim.akhtar@...sung.com>,
Avri Altman <avri.altman@....com>,
"James E.J. Bottomley" <jejb@...ux.ibm.com>,
"Martin K. Petersen" <martin.petersen@...cle.com>,
Stanley Chu <stanley.chu@...iatek.com>,
Bean Huo <beanhuo@...ron.com>,
Jaegeuk Kim <jaegeuk@...nel.org>,
open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 5/9] scsi: ufs: Simplify error handling preparation
Hi Bart,
On 2021-06-12 14:46, Can Guo wrote:
> On 2021-06-12 04:58, Bart Van Assche wrote:
>> On 6/10/21 8:01 PM, Can Guo wrote:
>>> Previously, without commit cb7e6f05fce67c965194ac04467e1ba7bc70b069,
>>> ufshcd_resume() may turn off pwr and clk due to UFS error, e.g., link
>>> transition failure and SSU error/abort (and these UFS error would
>>> invoke error handling). When error handling kicks start, it should
>>> re-enable the pwr and clk before proceeding. Now, commit
>>> cb7e6f05fce67c965194ac04467e1ba7bc70b069 makes ufshcd_resume()
>>> purely control pwr and clk, meaning if ufshcd_resume() fails, there
>>> is nothing we can do about it - pwr or clk enabling must have failed,
>>> and it is not because of UFS error. This is why I am removing the
>>> re-enabling pwr/clk in error handling prepare.
>>
>> Why are link transition failures handled in the error handler instead
>> of
>> in the context where these errors are detected (ufshcd_resume())? Is
>> it
>> even possible to recover from a link transition failure or does this
>> perhaps indicate a broken UFS controller?
>
> Basically, almost all UFS failures are caused by errors in underlaying
> layers,
> i.e., UIC errors, including link transition failures. And according to
> UFSHCI
> spec, SW should do a full reset to recover it, just like handle any
> other
> fatal UIC errors. All UIC errors are detected by HW and reported by IRQ
> handler.
>
> UFSHCI Spec Ver. 31
> 8.2.7 Hibernate Enter/Exit Error Handling
> Hibernate Enter/Exit Error occurs when the UniPro link is broken. When
> this condition occurs,
> host software should reset the host controller by setting register HCE
> to ‘0’, re-initialize the host
> controller by setting register HCE to ‘1', and then start link startup
> sequence as shown in Figure 16.
>
>>
>>>> but what I really wonder is why we don't just do recovery directly
>>>> in __ufshcd_wl_suspend() and __ufshcd_wl_resume() and strip all
>>>> the PM complexity out of ufshcd_err_handling()?
>>
>> +1
>
> I've explained why I chose not to do this in my last reply to Adrian.
> Please kindly check it.
>
>>
>>> For system suspend/resume, since error handling has the same nature
>>> like user access, so we are using host_sem to avoid concurrency of
>>> error handling and system suspend/resume.
>>
>> Why is host_sem used for that purpose instead of lock_system_sleep()
>> and
>> unlock_system_sleep()?
>>
>
> I was aware of it, but the situation is that host_sem is also used to
> avoid concurrency among user access, error handling and shutdown, so
> I think just use host_sem anyways to simply the lockings, otherwise
> user access and error handling would have to take both
> system_transition_mutex
> and host_sem
On second thought, I will take your suggestion to use
lock_system_sleep()
and unlock_system_sleep() in error handler and remove the host_sem used
in suspend/resume, which can make the code more readable by keeping the
changes within error handler itself. However, please note that host_sem
will still be used to avoid concurrency of user access, error handler
and
shutdown.
Thanks,
Can Guo.
>
> Thanks,
>
> Can Guo.
>
>> Thanks,
>>
>> Bart.
Powered by blists - more mailing lists