linux-kernel - Re: [PATCH v4 1/8] scsi: ufs: Flush exception event before suspend

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <4f9017b412139762fdda8c8d1741ae7b@codeaurora.org>
Date:   Tue, 04 Feb 2020 14:28:24 +0800
From:   Can Guo <cang@...eaurora.org>
To:     Bart Van Assche <bvanassche@....org>
Cc:     asutoshd@...eaurora.org, nguyenb@...eaurora.org,
        hongwus@...eaurora.org, rnayak@...eaurora.org,
        linux-scsi@...r.kernel.org, kernel-team@...roid.com,
        saravanak@...gle.com, salyzyn@...gle.com,
        Sayali Lokhande <sayalil@...eaurora.org>,
        Alim Akhtar <alim.akhtar@...sung.com>,
        Avri Altman <avri.altman@....com>,
        Pedro Sousa <pedrom.sousa@...opsys.com>,
        "James E.J. Bottomley" <jejb@...ux.ibm.com>,
        "Martin K. Petersen" <martin.petersen@...cle.com>,
        Stanley Chu <stanley.chu@...iatek.com>,
        Bean Huo <beanhuo@...ron.com>,
        Venkat Gopalakrishnan <venkatg@...eaurora.org>,
        Tomas Winkler <tomas.winkler@...el.com>,
        open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4 1/8] scsi: ufs: Flush exception event before suspend

On 2020-02-04 11:12, Bart Van Assche wrote:
> On 2020-02-02 22:23, Can Guo wrote:
>> On 2020-01-26 11:29, Bart Van Assche wrote:
>>> On 2020-01-22 23:25, Can Guo wrote:
>>>>              break;
>>>>          case UPIU_TRANSACTION_REJECT_UPIU:
>>>>              /* TODO: handle Reject UPIU Response */
>>>> @@ -5215,7 +5222,14 @@ static void
>>>> ufshcd_exception_event_handler(struct work_struct *work)
>>>> 
>>>>  out:
>>>>      scsi_unblock_requests(hba->host);
>>>> -    pm_runtime_put_sync(hba->dev);
>>>> +    /*
>>>> +     * pm_runtime_get_noresume is called while scheduling
>>>> +     * eeh_work to avoid suspend racing with exception work.
>>>> +     * Hence decrement usage counter using pm_runtime_put_noidle
>>>> +     * to allow suspend on completion of exception event handler.
>>>> +     */
>>>> +    pm_runtime_put_noidle(hba->dev);
>>>> +    pm_runtime_put(hba->dev);
>>>>      return;
>>>>  }
>>>> 
>>>> @@ -7901,6 +7915,7 @@ static int ufshcd_suspend(struct ufs_hba *hba,
>>>> enum ufs_pm_op pm_op)
>>>>              goto enable_gating;
>>>>      }
>>>> 
>>>> +    flush_work(&hba->eeh_work);
>>>>      ret = ufshcd_link_state_transition(hba, req_link_state, 1);
>>>>      if (ret)
>>>>          goto set_dev_active;
>>> 
>>> I think this patch introduces a new race condition, namely the 
>>> following:
>>> - ufshcd_slave_destroy() tests pm_op_in_progress and reads the value
>>>   zero from that variable.
>>> - ufshcd_suspend() sets hba->pm_op_in_progress to one.
>>> - ufshcd_slave_destroy() calls schedule_work().
>>> 
>>> How about fixing this race condition by calling
>>> pm_runtime_get_noresume() before checking pm_op_in_progress and by
>>> reallowing resume if no work is scheduled?
>> 
>> If you apply this patch, you will find the change is not in
>> ufshcd_slave_destroy(), but in ufshcd_transfer_rsp_status().
>> So the racing you mentioned above does not exist.
> 
> Hi Can,
> 
> Apparently I got a function name wrong. Can the following race 
> condition
> happen:
> - ufshcd_transfer_rsp_status() tests pm_op_in_progress and reads the
>   value zero from that variable.
> - ufshcd_suspend() sets hba->pm_op_in_progress to one.
> - ufshcd_suspend() calls flush_work(&hba->eeh_work).
> - ufshcd_transfer_rsp_status() calls schedule_work(&hba->eeh_work).
> 
> Thanks,
> 
> Bart.

Hi Bart,

The sequence you mentioned is not possible.

In normal cases, before ufshcd_transfer_rsp_status() returns,
ufshcd_suspend() would not be called (unless you intentionally call
ufshcd_suspend() to screw it). Because ufshcd_transfer_rsp_status() is
called from __ufshcd_transfer_req_compl(), which is being used by either
UFS IRQ handler or err handler. Meanwhile, in 
__ufshcd_transfer_req_compl(),
scsi_done() is called only after ufshcd_transfer_rsp_status() returns. 
When
we are here, it means UFS driver is still handling requests/tasks, so 
suspend
would not kick start at this moment, either runtime suspend or system 
suspend.

And this is why below lines work, calling pm_runtime_get_noresume() 
within
ufshcd_transfer_rsp_status() can prevent runtime suspend from happening
after we leave ufshcd_transfer_rsp_status().

+                if (schedule_work(&hba->eeh_work))
+                    pm_runtime_get_noresume(hba->dev);

Thanks,

Can Guo.