lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <16f5bd448c7ae1a45fcb23133391aa3f@codeaurora.org>
Date:   Sat, 12 Jun 2021 15:07:38 +0800
From:   Can Guo <cang@...eaurora.org>
To:     Bart Van Assche <bvanassche@....org>
Cc:     asutoshd@...eaurora.org, nguyenb@...eaurora.org,
        hongwus@...eaurora.org, ziqichen@...eaurora.org,
        linux-scsi@...r.kernel.org, kernel-team@...roid.com,
        Alim Akhtar <alim.akhtar@...sung.com>,
        Avri Altman <avri.altman@....com>,
        "James E.J. Bottomley" <jejb@...ux.ibm.com>,
        "Martin K. Petersen" <martin.petersen@...cle.com>,
        Stanley Chu <stanley.chu@...iatek.com>,
        Bean Huo <beanhuo@...ron.com>,
        Jaegeuk Kim <jaegeuk@...nel.org>,
        open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 8/9] scsi: ufs: Update the fast abort path in
 ufshcd_abort() for PM requests

On 2021-06-12 05:02, Bart Van Assche wrote:
> On 6/9/21 9:43 PM, Can Guo wrote:
>> If PM requests fail during runtime suspend/resume, RPM framework saves 
>> the
>> error to dev->power.runtime_error. Before the runtime_error gets 
>> cleared,
>> runtime PM on this specific device won't work again, leaving the 
>> device
>> either runtime active or runtime suspended permanently.
>> 
>> When task abort happens to a PM request sent during runtime 
>> suspend/resume,
>> even if it can be successfully aborted, RPM framework anyways saves 
>> the
>> (TIMEOUT) error. In this situation, we can leverage error handling to
>> recover and clear the runtime_error. So, let PM requests take the fast
>> abort path in ufshcd_abort().
> 
> How can a PM request fail during runtime suspend/resume? Does such a
> failure perhaps indicate an UFS controller bug?

I've replied your similar question in previous series. I've seen too 
much
SSU cmd and SYNCHRONIZE_CACHE cmd timed out these years, 60s is not even
enough for them to complete. And you are right, most cases are that 
device
is not responding - UFS controller is busy with housekeeping.

> I appreciate your work
> but I'm wondering whether it's worth to complicate the UFS driver for
> issues that should be fixed in the controller instead of in software.
> 

Sigh... I also want my life and work to be easier... I agree with you.

In project bring up stage, we fix whatever error/bug/failure we face to
unblock the project, during which we only focus on and try to fix the 
very
first UFS error, but not quite care about the error recovery or what the
error can possibly cause (usually more UFS errors and system stability 
issues
follow the very first UFS error).

However, these years our customers tend to ask for more - they want UFS 
error
handling to recover everything whenever UFS error occurs, because they 
believe
it is the last line of defense after their products go out to market. So 
I took
a lot of effort fixing, testing and trying to make it robust. Now here 
we are.
FYI, I am on a tight schedule to have these UFS error handling changes 
ready in
Android12-5.10.

Thanks,

Can Guo.

> Thanks,
> 
> Bart.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ