[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2fa53602-8968-09e4-60f4-28462d85ae08@acm.org>
Date: Wed, 16 Jun 2021 10:55:03 -0700
From: Bart Van Assche <bvanassche@....org>
To: Can Guo <cang@...eaurora.org>
Cc: asutoshd@...eaurora.org, nguyenb@...eaurora.org,
hongwus@...eaurora.org, ziqichen@...eaurora.org,
linux-scsi@...r.kernel.org, kernel-team@...roid.com,
Alim Akhtar <alim.akhtar@...sung.com>,
Avri Altman <avri.altman@....com>,
"James E.J. Bottomley" <jejb@...ux.ibm.com>,
"Martin K. Petersen" <martin.petersen@...cle.com>,
Stanley Chu <stanley.chu@...iatek.com>,
Bean Huo <beanhuo@...ron.com>,
Jaegeuk Kim <jaegeuk@...nel.org>,
open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 8/9] scsi: ufs: Update the fast abort path in
ufshcd_abort() for PM requests
On 6/16/21 1:47 AM, Can Guo wrote:
> On 2021-06-16 12:40, Bart Van Assche wrote:
>> On 6/15/21 9:00 PM, Can Guo wrote:
>>> 2. And say we want SCSI layer to resubmit PM requests to prevent
>>> suspend/resume fail, we should keep retrying the PM requests (so
>>> long as error handler can recover everything successfully),
>>> meaning we should give them unlimited retries (which I think is a
>>> bad idea), otherwise (if they have zero retries or limited
>>> retries), in extreme conditions, what may happen is that error
>>> handler can recover everything successfully every time, but all
>>> these retries (say 3) still time out, which block the power
>>> management for too long (retries * 60 seconds) and, most
>>> important, when the last retry times out, scsi layer will
>>> anyways complete the PM request (even we return DID_IMM_RETRY),
>>> then we end up same - suspend/resume shall run concurrently with
>>> error handler and we couldn't recover saved PM errors.
>>
>> Hmm ... it is not clear to me why this behavior is considered a
>> problem?
>
> To me, task abort to PM requests does not worth being treated so
> differently, after all suspend/resume may fail due to any kinds of
> UFS errors (as I've explained so many times). My idea is to let PM
> requests fast fail (60 seconds has passed, a broken device maybe, we
> have reason to fail it since it is just a passthrough req) and
> schedule UFS error handler, UFS error handler shall proceed after
> suspend/resume fails out then start to recover everything in a safe
> environment. Is this way not working?
Hi Can,
Thank you for the clarification. As you probably know the power
management subsystem serializes runtime power management (RPM) and
system suspend callbacks. I was concerned about the consequences of a
failed RPM transition on system suspend and resume. Having taken a
closer look at the UFS driver, I see that failed RPM transitions do not
require special handling in the system suspend or resume callbacks. In
other words, I'm fine with the approach of failing PM requests fast.
Bart.
Powered by blists - more mailing lists