[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3970b015e444c1f1714c7e7bd4c44651@codeaurora.org>
Date: Thu, 24 Jun 2021 14:31:22 +0800
From: Can Guo <cang@...eaurora.org>
To: Adrian Hunter <adrian.hunter@...el.com>
Cc: asutoshd@...eaurora.org, nguyenb@...eaurora.org,
hongwus@...eaurora.org, ziqichen@...eaurora.org,
linux-scsi@...r.kernel.org, kernel-team@...roid.com,
Alim Akhtar <alim.akhtar@...sung.com>,
Avri Altman <avri.altman@....com>,
"James E.J. Bottomley" <jejb@...ux.ibm.com>,
"Martin K. Petersen" <martin.petersen@...cle.com>,
Stanley Chu <stanley.chu@...iatek.com>,
Bean Huo <beanhuo@...ron.com>,
Jaegeuk Kim <jaegeuk@...nel.org>,
open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4 06/10] scsi: ufs: Remove host_sem used in
suspend/resume
On 2021-06-24 14:23, Adrian Hunter wrote:
> On 24/06/21 9:12 am, Can Guo wrote:
>> On 2021-06-24 13:52, Adrian Hunter wrote:
>>> On 24/06/21 5:16 am, Can Guo wrote:
>>>> On 2021-06-23 22:30, Adrian Hunter wrote:
>>>>> On 23/06/21 10:35 am, Can Guo wrote:
>>>>>> To protect system suspend/resume from being disturbed by error
>>>>>> handling,
>>>>>> instead of using host_sem, let error handler call
>>>>>> lock_system_sleep() and
>>>>>> unlock_system_sleep() which achieve the same purpose. Remove the
>>>>>> host_sem
>>>>>> used in suspend/resume paths to make the code more readable.
>>>>>>
>>>>>> Suggested-by: Bart Van Assche <bvanassche@....org>
>>>>>> Signed-off-by: Can Guo <cang@...eaurora.org>
>>>>>> ---
>>>>>> drivers/scsi/ufs/ufshcd.c | 12 +++++++-----
>>>>>> 1 file changed, 7 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
>>>>>> index 3695dd2..a09e4a2 100644
>>>>>> --- a/drivers/scsi/ufs/ufshcd.c
>>>>>> +++ b/drivers/scsi/ufs/ufshcd.c
>>>>>> @@ -5907,6 +5907,11 @@ static void
>>>>>> ufshcd_clk_scaling_suspend(struct ufs_hba *hba, bool suspend)
>>>>>>
>>>>>> static void ufshcd_err_handling_prepare(struct ufs_hba *hba)
>>>>>> {
>>>>>> + /*
>>>>>> + * It is not safe to perform error handling while suspend or
>>>>>> resume is
>>>>>> + * in progress. Hence the lock_system_sleep() call.
>>>>>> + */
>>>>>> + lock_system_sleep();
>>>>>
>>>>> It looks to me like the system takes this lock quite early, even
>>>>> before
>>>>> freezing tasks, so if anything needs the error handler to run it
>>>>> will
>>>>> deadlock.
>>>>
>>>> Hi Adrian,
>>>>
>>>> UFS/hba system suspend/resume does not invoke or call error handling
>>>> in a
>>>> synchronous way. So, whatever UFS errors (which schedules the error
>>>> handler)
>>>> happens during suspend/resume, error handler will just wait here
>>>> till system
>>>> suspend/resume release the lock. Hence no worries of deadlock here.
>>>
>>> It looks to me like the state can change to
>>> UFSHCD_STATE_EH_SCHEDULED_FATAL
>>> and since user processes are not frozen, nor file systems sync'ed,
>>> everything
>>> is going to deadlock.
>>> i.e.
>>> I/O is blocked waiting on error handling
>>> error handling is blocked waiting on lock_system_sleep()
>>> suspend is blocked waiting on I/O
>>>
>>
>> Hi Adrian,
>>
>> First of all, enter_state(suspend_state_t state) uses
>> mutex_trylock(&system_transition_mutex).
>
> Yes, in the case I am outlining it gets the mutex.
>
>> Second, even that happens, in ufshcd_queuecommand(), below logic will
>> break the cycle, by
>> fast failing the PM request (below codes are from the code tip with
>> this whole series applied).
>
> It won't get that far because the suspend will be waiting to sync
> filesystems.
> Filesystems will be waiting on I/O.
> I/O will be waiting on the error handler.
> The error handler will be waiting on system_transition_mutex.
> But system_transition_mutex is already held by PM core.
Hi Adrian,
You are right.... I missed the action of syncing filesystems...
Using back host_sem in suspend_prepare()/resume_complete() won't have
this
problem of deadlock, right?
Thanks,
Can Guo.
>
>>
>> case UFSHCD_STATE_EH_SCHEDULED_FATAL:
>> /*
>> * ufshcd_rpm_get_sync() is used at error handling
>> preparation
>> * stage. If a scsi cmd, e.g., the SSU cmd, is sent
>> from the
>> * PM ops, it can never be finished if we let SCSI
>> layer keep
>> * retrying it, which gets err handler stuck forever.
>> Neither
>> * can we let the scsi cmd pass through, because UFS
>> is in bad
>> * state, the scsi cmd may eventually time out, which
>> will get
>> * err handler blocked for too long. So, just fail the
>> scsi cmd
>> * sent from PM ops, err handler can recover PM error
>> anyways.
>> */
>> if (cmd->request->rq_flags & RQF_PM) {
>> hba->force_reset = true;
>> set_host_byte(cmd, DID_BAD_TARGET);
>> cmd->scsi_done(cmd);
>> goto out;
>> }
>> fallthrough;
>> case UFSHCD_STATE_RESET:
>>
>> Thanks,
>>
>> Can Guo.
>>
>>>>
>>>> Thanks,
>>>>
>>>> Can Guo.
>>>>
>>>>>
>>>>>> ufshcd_rpm_get_sync(hba);
>>>>>> if
>>>>>> (pm_runtime_status_suspended(&hba->sdev_ufs_device->sdev_gendev)
>>>>>> ||
>>>>>> hba->is_wlu_sys_suspended) {
>>>>>> @@ -5951,6 +5956,7 @@ static void
>>>>>> ufshcd_err_handling_unprepare(struct ufs_hba *hba)
>>>>>> ufshcd_clk_scaling_suspend(hba, false);
>>>>>> ufshcd_clear_ua_wluns(hba);
>>>>>> ufshcd_rpm_put(hba);
>>>>>> + unlock_system_sleep();
>>>>>> }
>>>>>>
>>>>>> static inline bool ufshcd_err_handling_should_stop(struct ufs_hba
>>>>>> *hba)
>>>>>> @@ -9053,16 +9059,13 @@ static int ufshcd_wl_suspend(struct device
>>>>>> *dev)
>>>>>> ktime_t start = ktime_get();
>>>>>>
>>>>>> hba = shost_priv(sdev->host);
>>>>>> - down(&hba->host_sem);
>>>>>>
>>>>>> if (pm_runtime_suspended(dev))
>>>>>> goto out;
>>>>>>
>>>>>> ret = __ufshcd_wl_suspend(hba, UFS_SYSTEM_PM);
>>>>>> - if (ret) {
>>>>>> + if (ret)
>>>>>> dev_err(&sdev->sdev_gendev, "%s failed: %d\n", __func__,
>>>>>> ret);
>>>>>> - up(&hba->host_sem);
>>>>>> - }
>>>>>>
>>>>>> out:
>>>>>> if (!ret)
>>>>>> @@ -9095,7 +9098,6 @@ static int ufshcd_wl_resume(struct device
>>>>>> *dev)
>>>>>> hba->curr_dev_pwr_mode, hba->uic_link_state);
>>>>>> if (!ret)
>>>>>> hba->is_wlu_sys_suspended = false;
>>>>>> - up(&hba->host_sem);
>>>>>> return ret;
>>>>>> }
>>>>>> #endif
>>>>>>
Powered by blists - more mailing lists