linux-kernel - Re: [PATCH v3 5/9] scsi: ufs: Simplify error handling preparation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <d3c57c8e52f7a251b5c536a893b1f101@codeaurora.org>
Date:   Sat, 12 Jun 2021 17:49:26 +0800
From:   Can Guo <cang@...eaurora.org>
To:     Bart Van Assche <bvanassche@....org>
Cc:     Adrian Hunter <adrian.hunter@...el.com>, asutoshd@...eaurora.org,
        nguyenb@...eaurora.org, hongwus@...eaurora.org,
        ziqichen@...eaurora.org, linux-scsi@...r.kernel.org,
        kernel-team@...roid.com, Alim Akhtar <alim.akhtar@...sung.com>,
        Avri Altman <avri.altman@....com>,
        "James E.J. Bottomley" <jejb@...ux.ibm.com>,
        "Martin K. Petersen" <martin.petersen@...cle.com>,
        Stanley Chu <stanley.chu@...iatek.com>,
        Bean Huo <beanhuo@...ron.com>,
        Jaegeuk Kim <jaegeuk@...nel.org>,
        open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 5/9] scsi: ufs: Simplify error handling preparation

Hi Bart,

On 2021-06-12 14:46, Can Guo wrote:
> On 2021-06-12 04:58, Bart Van Assche wrote:
>> On 6/10/21 8:01 PM, Can Guo wrote:
>>> Previously, without commit cb7e6f05fce67c965194ac04467e1ba7bc70b069,
>>> ufshcd_resume() may turn off pwr and clk due to UFS error, e.g., link
>>> transition failure and SSU error/abort (and these UFS error would
>>> invoke error handling).  When error handling kicks start, it should
>>> re-enable the pwr and clk before proceeding. Now, commit
>>> cb7e6f05fce67c965194ac04467e1ba7bc70b069 makes ufshcd_resume()
>>> purely control pwr and clk, meaning if ufshcd_resume() fails, there
>>> is nothing we can do about it - pwr or clk enabling must have failed,
>>> and it is not because of UFS error. This is why I am removing the
>>> re-enabling pwr/clk in error handling prepare.
>> 
>> Why are link transition failures handled in the error handler instead 
>> of
>> in the context where these errors are detected (ufshcd_resume())? Is 
>> it
>> even possible to recover from a link transition failure or does this
>> perhaps indicate a broken UFS controller?
> 
> Basically, almost all UFS failures are caused by errors in underlaying 
> layers,
> i.e., UIC errors, including link transition failures. And according to 
> UFSHCI
> spec, SW should do a full reset to recover it, just like handle any 
> other
> fatal UIC errors. All UIC errors are detected by HW and reported by IRQ 
> handler.
> 
> UFSHCI Spec Ver. 31
> 8.2.7 Hibernate Enter/Exit Error Handling
> Hibernate Enter/Exit Error occurs when the UniPro link is broken. When
> this condition occurs,
> host software should reset the host controller by setting register HCE
> to ‘0’, re-initialize the host
> controller by setting register HCE to ‘1', and then start link startup
> sequence as shown in Figure 16.
> 
>> 
>>>> but what I really wonder is why we don't just do recovery directly
>>>> in __ufshcd_wl_suspend() and  __ufshcd_wl_resume() and strip all
>>>> the PM complexity out of ufshcd_err_handling()?
>> 
>> +1
> 
> I've explained why I chose not to do this in my last reply to Adrian.
> Please kindly check it.
> 
>> 
>>> For system suspend/resume, since error handling has the same nature
>>> like user access, so we are using host_sem to avoid concurrency of
>>> error handling and system suspend/resume.
>> 
>> Why is host_sem used for that purpose instead of lock_system_sleep() 
>> and
>> unlock_system_sleep()?
>> 
> 
> I was aware of it, but the situation is that host_sem is also used to
> avoid concurrency among user access, error handling and shutdown, so
> I think just use host_sem anyways to simply the lockings, otherwise
> user access and error handling would have to take both 
> system_transition_mutex
> and host_sem

On second thought, I will take your suggestion to use 
lock_system_sleep()
and unlock_system_sleep() in error handler and remove the host_sem used
in suspend/resume, which can make the code more readable by keeping the
changes within error handler itself. However, please note that host_sem
will still be used to avoid concurrency of user access, error handler 
and
shutdown.

Thanks,
Can Guo.

> 
> Thanks,
> 
> Can Guo.
> 
>> Thanks,
>> 
>> Bart.