[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <09c5d4d31a0bd9bed99815cfbf51aaad@codeaurora.org>
Date: Tue, 03 Nov 2020 18:01:01 +0800
From: Can Guo <cang@...eaurora.org>
To: Stanley Chu <stanley.chu@...iatek.com>
Cc: asutoshd@...eaurora.org, nguyenb@...eaurora.org,
hongwus@...eaurora.org, rnayak@...eaurora.org,
linux-scsi@...r.kernel.org, kernel-team@...roid.com,
saravanak@...gle.com, salyzyn@...gle.com,
Alim Akhtar <alim.akhtar@...sung.com>,
Avri Altman <avri.altman@....com>,
"James E.J. Bottomley" <jejb@...ux.ibm.com>,
"Martin K. Petersen" <martin.petersen@...cle.com>,
Bean Huo <beanhuo@...ron.com>,
Bart Van Assche <bvanassche@....org>,
open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v1 1/2] scsi: ufs: Fix unbalanced scsi_block_reqs_cnt
caused by ufshcd_hold()
On 2020-11-03 15:07, Stanley Chu wrote:
> Hi Can,
>
> On Mon, 2020-11-02 at 22:24 -0800, Can Guo wrote:
>> The scsi_block_reqs_cnt increased in ufshcd_hold() is supposed to be
>> decreased back in ufshcd_ungate_work() in a paired way. However, if
>> specific ufshcd_hold/release sequences are met, it is possible that
>> scsi_block_reqs_cnt is increased twice but only one ungate work is
>> queued. To make sure scsi_block_reqs_cnt is handled by ufshcd_hold()
>> and
>
> Just curious that how could this be possible? Would you have some
> failed
> examples?
>
[1] One gate_work() is in the workqueue, not yet executed, now clk state
== REQ_CLKS_OFF.
[2] ufshcd_queuecommand() calls ufshcd_hold(async == ture) ->
active_req++ -> scsi_block_reqs_cnt++ -> REQ_CLKS_ON -> queue ungate
work -> active_req-- -> return -EAGAIN.
[3] Now gate_work() starts to run, but since the clk state is
REQ_CLKS_ON, gate_work() just sets clk state to CLKS_ON and bail.
[3] Someone calls ufshcd_hold(async == false) -> do something ->
ufshcd_release() -> clk state is changed to REQ_CLKS_OFF. Note that,
till now, ungate_work() is still in the work queue, not executed yet.
[4] Now, if someone calls ufshcd_hold(), we will hit the issue.
Above sequence is a very common clk gate/ungate sequence. The issue
is because ungate_work is queued but cannot be executed in time. In my
case, I see the ungate_work is somehow delayed for about 150ms. This
change has been tested by customers on multiple platforms. And you
can tell from the code that it won't break anything. :)
Thanks,
Can Guo.
>> ufshcd_ungate_work() in a paired way, increase it only if queue_work()
>> returns true.
>>
>> Signed-off-by: Can Guo <cang@...eaurora.org>
>> Reviewed-by: Hongwu Su <hongwus@...eaurora.org>
>> ---
>> drivers/scsi/ufs/ufshcd.c | 6 +++---
>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
>> index 847f355..efa7d86 100644
>> --- a/drivers/scsi/ufs/ufshcd.c
>> +++ b/drivers/scsi/ufs/ufshcd.c
>> @@ -1634,12 +1634,12 @@ int ufshcd_hold(struct ufs_hba *hba, bool
>> async)
>> */
>> /* fallthrough */
>> case CLKS_OFF:
>> - ufshcd_scsi_block_requests(hba);
>> hba->clk_gating.state = REQ_CLKS_ON;
>> trace_ufshcd_clk_gating(dev_name(hba->dev),
>> hba->clk_gating.state);
>> - queue_work(hba->clk_gating.clk_gating_workq,
>> - &hba->clk_gating.ungate_work);
>> + if (queue_work(hba->clk_gating.clk_gating_workq,
>> + &hba->clk_gating.ungate_work))
>> + ufshcd_scsi_block_requests(hba);
>> /*
>> * fall through to check if we should wait for this
>> * work to be done or not.
>
> Thanks,
> Stanley Chu
Powered by blists - more mailing lists