[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <82723efc44714e8677505cb7999d3fd5@codeaurora.org>
Date: Mon, 03 Feb 2020 14:23:15 +0800
From: Can Guo <cang@...eaurora.org>
To: Bart Van Assche <bvanassche@....org>
Cc: asutoshd@...eaurora.org, nguyenb@...eaurora.org,
hongwus@...eaurora.org, rnayak@...eaurora.org,
linux-scsi@...r.kernel.org, kernel-team@...roid.com,
saravanak@...gle.com, salyzyn@...gle.com,
Sayali Lokhande <sayalil@...eaurora.org>,
Alim Akhtar <alim.akhtar@...sung.com>,
Avri Altman <avri.altman@....com>,
Pedro Sousa <pedrom.sousa@...opsys.com>,
"James E.J. Bottomley" <jejb@...ux.ibm.com>,
"Martin K. Petersen" <martin.petersen@...cle.com>,
Stanley Chu <stanley.chu@...iatek.com>,
Bean Huo <beanhuo@...ron.com>,
Venkat Gopalakrishnan <venkatg@...eaurora.org>,
Tomas Winkler <tomas.winkler@...el.com>,
open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4 1/8] scsi: ufs: Flush exception event before suspend
On 2020-01-26 11:29, Bart Van Assche wrote:
> On 2020-01-22 23:25, Can Guo wrote:
>> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
>> index 1201578..c2de29f 100644
>> --- a/drivers/scsi/ufs/ufshcd.c
>> +++ b/drivers/scsi/ufs/ufshcd.c
>> @@ -4760,8 +4760,15 @@ static void ufshcd_slave_destroy(struct
>> scsi_device *sdev)
>> * UFS device needs urgent BKOPs.
>> */
>> if (!hba->pm_op_in_progress &&
>> - ufshcd_is_exception_event(lrbp->ucd_rsp_ptr))
>> - schedule_work(&hba->eeh_work);
>> + ufshcd_is_exception_event(lrbp->ucd_rsp_ptr)) {
>> + /*
>> + * Prevent suspend once eeh_work is scheduled
>> + * to avoid deadlock between ufshcd_suspend
>> + * and exception event handler.
>> + */
>> + if (schedule_work(&hba->eeh_work))
>> + pm_runtime_get_noresume(hba->dev);
>> + }
>
> Please combine the two logical tests with "&&" instead of nesting two
> if-statements inside each other.
>
>> break;
>> case UPIU_TRANSACTION_REJECT_UPIU:
>> /* TODO: handle Reject UPIU Response */
>> @@ -5215,7 +5222,14 @@ static void
>> ufshcd_exception_event_handler(struct work_struct *work)
>>
>> out:
>> scsi_unblock_requests(hba->host);
>> - pm_runtime_put_sync(hba->dev);
>> + /*
>> + * pm_runtime_get_noresume is called while scheduling
>> + * eeh_work to avoid suspend racing with exception work.
>> + * Hence decrement usage counter using pm_runtime_put_noidle
>> + * to allow suspend on completion of exception event handler.
>> + */
>> + pm_runtime_put_noidle(hba->dev);
>> + pm_runtime_put(hba->dev);
>> return;
>> }
>>
>> @@ -7901,6 +7915,7 @@ static int ufshcd_suspend(struct ufs_hba *hba,
>> enum ufs_pm_op pm_op)
>> goto enable_gating;
>> }
>>
>> + flush_work(&hba->eeh_work);
>> ret = ufshcd_link_state_transition(hba, req_link_state, 1);
>> if (ret)
>> goto set_dev_active;
>
> I think this patch introduces a new race condition, namely the
> following:
> - ufshcd_slave_destroy() tests pm_op_in_progress and reads the value
> zero from that variable.
> - ufshcd_suspend() sets hba->pm_op_in_progress to one.
> - ufshcd_slave_destroy() calls schedule_work().
>
> How about fixing this race condition by calling
> pm_runtime_get_noresume() before checking pm_op_in_progress and by
> reallowing resume if no work is scheduled?
>
> Thanks,
>
> Bart.
Hi Bart,
If you apply this patch, you will find the change is not in
ufshcd_slave_destroy(), but in ufshcd_transfer_rsp_status().
So the racing you mentioned above does not exist.
Thanks,
Can Guo.
Powered by blists - more mailing lists