linux-kernel - Re: [PATCH v4 06/10] scsi: ufs: Remove host

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1b351766a6e40d0df90b3adec964eb33@codeaurora.org>
Date:   Thu, 24 Jun 2021 14:12:31 +0800
From:   Can Guo <cang@...eaurora.org>
To:     Adrian Hunter <adrian.hunter@...el.com>
Cc:     asutoshd@...eaurora.org, nguyenb@...eaurora.org,
        hongwus@...eaurora.org, ziqichen@...eaurora.org,
        linux-scsi@...r.kernel.org, kernel-team@...roid.com,
        Alim Akhtar <alim.akhtar@...sung.com>,
        Avri Altman <avri.altman@....com>,
        "James E.J. Bottomley" <jejb@...ux.ibm.com>,
        "Martin K. Petersen" <martin.petersen@...cle.com>,
        Stanley Chu <stanley.chu@...iatek.com>,
        Bean Huo <beanhuo@...ron.com>,
        Jaegeuk Kim <jaegeuk@...nel.org>,
        open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4 06/10] scsi: ufs: Remove host_sem used in
 suspend/resume

On 2021-06-24 13:52, Adrian Hunter wrote:
> On 24/06/21 5:16 am, Can Guo wrote:
>> On 2021-06-23 22:30, Adrian Hunter wrote:
>>> On 23/06/21 10:35 am, Can Guo wrote:
>>>> To protect system suspend/resume from being disturbed by error 
>>>> handling,
>>>> instead of using host_sem, let error handler call 
>>>> lock_system_sleep() and
>>>> unlock_system_sleep() which achieve the same purpose. Remove the 
>>>> host_sem
>>>> used in suspend/resume paths to make the code more readable.
>>>> 
>>>> Suggested-by: Bart Van Assche <bvanassche@....org>
>>>> Signed-off-by: Can Guo <cang@...eaurora.org>
>>>> ---
>>>>  drivers/scsi/ufs/ufshcd.c | 12 +++++++-----
>>>>  1 file changed, 7 insertions(+), 5 deletions(-)
>>>> 
>>>> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
>>>> index 3695dd2..a09e4a2 100644
>>>> --- a/drivers/scsi/ufs/ufshcd.c
>>>> +++ b/drivers/scsi/ufs/ufshcd.c
>>>> @@ -5907,6 +5907,11 @@ static void ufshcd_clk_scaling_suspend(struct 
>>>> ufs_hba *hba, bool suspend)
>>>> 
>>>>  static void ufshcd_err_handling_prepare(struct ufs_hba *hba)
>>>>  {
>>>> +    /*
>>>> +     * It is not safe to perform error handling while suspend or 
>>>> resume is
>>>> +     * in progress. Hence the lock_system_sleep() call.
>>>> +     */
>>>> +    lock_system_sleep();
>>> 
>>> It looks to me like the system takes this lock quite early, even 
>>> before
>>> freezing tasks, so if anything needs the error handler to run it will
>>> deadlock.
>> 
>> Hi Adrian,
>> 
>> UFS/hba system suspend/resume does not invoke or call error handling 
>> in a
>> synchronous way. So, whatever UFS errors (which schedules the error 
>> handler)
>> happens during suspend/resume, error handler will just wait here till 
>> system
>> suspend/resume release the lock. Hence no worries of deadlock here.
> 
> It looks to me like the state can change to 
> UFSHCD_STATE_EH_SCHEDULED_FATAL
> and since user processes are not frozen, nor file systems sync'ed, 
> everything
> is going to deadlock.
> i.e.
> I/O is blocked waiting on error handling
> error handling is blocked waiting on lock_system_sleep()
> suspend is blocked waiting on I/O
> 

Hi Adrian,

First of all, enter_state(suspend_state_t state) uses 
mutex_trylock(&system_transition_mutex).
Second, even that happens, in ufshcd_queuecommand(), below logic will 
break the cycle, by
fast failing the PM request (below codes are from the code tip with this 
whole series applied).

         case UFSHCD_STATE_EH_SCHEDULED_FATAL:
                 /*
                  * ufshcd_rpm_get_sync() is used at error handling 
preparation
                  * stage. If a scsi cmd, e.g., the SSU cmd, is sent from 
the
                  * PM ops, it can never be finished if we let SCSI layer 
keep
                  * retrying it, which gets err handler stuck forever. 
Neither
                  * can we let the scsi cmd pass through, because UFS is 
in bad
                  * state, the scsi cmd may eventually time out, which 
will get
                  * err handler blocked for too long. So, just fail the 
scsi cmd
                  * sent from PM ops, err handler can recover PM error 
anyways.
                  */
                 if (cmd->request->rq_flags & RQF_PM) {
                         hba->force_reset = true;
                         set_host_byte(cmd, DID_BAD_TARGET);
                         cmd->scsi_done(cmd);
                         goto out;
                 }
                 fallthrough;
         case UFSHCD_STATE_RESET:

Thanks,

Can Guo.

>> 
>> Thanks,
>> 
>> Can Guo.
>> 
>>> 
>>>>      ufshcd_rpm_get_sync(hba);
>>>>      if 
>>>> (pm_runtime_status_suspended(&hba->sdev_ufs_device->sdev_gendev) ||
>>>>          hba->is_wlu_sys_suspended) {
>>>> @@ -5951,6 +5956,7 @@ static void 
>>>> ufshcd_err_handling_unprepare(struct ufs_hba *hba)
>>>>          ufshcd_clk_scaling_suspend(hba, false);
>>>>      ufshcd_clear_ua_wluns(hba);
>>>>      ufshcd_rpm_put(hba);
>>>> +    unlock_system_sleep();
>>>>  }
>>>> 
>>>>  static inline bool ufshcd_err_handling_should_stop(struct ufs_hba 
>>>> *hba)
>>>> @@ -9053,16 +9059,13 @@ static int ufshcd_wl_suspend(struct device 
>>>> *dev)
>>>>      ktime_t start = ktime_get();
>>>> 
>>>>      hba = shost_priv(sdev->host);
>>>> -    down(&hba->host_sem);
>>>> 
>>>>      if (pm_runtime_suspended(dev))
>>>>          goto out;
>>>> 
>>>>      ret = __ufshcd_wl_suspend(hba, UFS_SYSTEM_PM);
>>>> -    if (ret) {
>>>> +    if (ret)
>>>>          dev_err(&sdev->sdev_gendev, "%s failed: %d\n", __func__,  
>>>> ret);
>>>> -        up(&hba->host_sem);
>>>> -    }
>>>> 
>>>>  out:
>>>>      if (!ret)
>>>> @@ -9095,7 +9098,6 @@ static int ufshcd_wl_resume(struct device 
>>>> *dev)
>>>>          hba->curr_dev_pwr_mode, hba->uic_link_state);
>>>>      if (!ret)
>>>>          hba->is_wlu_sys_suspended = false;
>>>> -    up(&hba->host_sem);
>>>>      return ret;
>>>>  }
>>>>  #endif
>>>>