linux-kernel - Re: [PATCH v3 1/5] scsi: ufs: atomic update for clkgating

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6c029b64cb4d78e7624bc896f9c9f16d@codeaurora.org>
Date:   Mon, 26 Oct 2020 14:43:53 +0800
From:   Can Guo <cang@...eaurora.org>
To:     Jaegeuk Kim <jaegeuk@...nel.org>
Cc:     linux-kernel@...r.kernel.org, linux-scsi@...r.kernel.org,
        linux-f2fs-devel@...ts.sourceforge.net, kernel-team@...roid.com,
        alim.akhtar@...sung.com, avri.altman@....com, bvanassche@....org
Subject: Re: [PATCH v3 1/5] scsi: ufs: atomic update for clkgating_enable

On 2020-10-26 14:13, Jaegeuk Kim wrote:
> On 10/26, Can Guo wrote:
>> On 2020-10-24 23:06, Jaegeuk Kim wrote:
>> > From: Jaegeuk Kim <jaegeuk@...gle.com>
>> >
>> > When giving a stress test which enables/disables clkgating, we hit
>> > device
>> > timeout sometimes. This patch avoids subtle racy condition to address
>> > it.
>> >
>> > If we use __ufshcd_release(), I've seen that gate_work can be called in
>> > parallel
>> > with ungate_work, which results in UFS timeout when doing hibern8.
>> > Should avoid it.
>> >
>> 
>> I don't understand this comment. gate_work and ungate_work are queued 
>> on
>> an ordered workqueue and an ordered workqueue executes at most one 
>> work item
>> at any given time in the queued order. How can the two run in 
>> parallel?
> 
> When I hit UFS stuck, I saw this by clkgating tracepoint.
> 
> - REQ_CLK_OFF
> - CLKS_OFF
> - REQ_CLK_OFF
> - REQ_CLKS_ON
> ..
> 

I don't see how can you tell that the two works are running in parallel
just from above trace. May I know what is the exact error by "UFS 
timeout
when doing hibern8"?

By using __ufshcd_release() here, I do see one potential issue if your 
test
quickly toggles on/off of clk_gating - disable it, enable it, disable it 
and
enable it, which will cause that __ufshcd_release() being called twice, 
meaning
we queue two gate_works back to back. So can you try below code and let 
me know
if it helps or not? I am OK with your current change, but I would like 
to
understand the problem. Thanks.

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 1791bce..3eee438 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -2271,6 +2271,8 @@ static void ufshcd_gate_work(struct work_struct 
*work)
         unsigned long flags;

         spin_lock_irqsave(hba->host->host_lock, flags);
+       if (hba->clk_gating.state == CLKS_OFF)
+               goto rel_lock;
         /*
          * In case you are here to cancel this work the gating state
          * would be marked as REQ_CLKS_ON. In this case save time by

Regards,

Can Guo.

> By using active_req, I don't see any problem.
> 
>> 
>> Thanks,
>> 
>> Can Guo.
>> 
>> > Signed-off-by: Jaegeuk Kim <jaegeuk@...gle.com>
>> > ---
>> >  drivers/scsi/ufs/ufshcd.c | 12 ++++++------
>> >  1 file changed, 6 insertions(+), 6 deletions(-)
>> >
>> > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
>> > index b8f573a02713..e0b479f9eb8a 100644
>> > --- a/drivers/scsi/ufs/ufshcd.c
>> > +++ b/drivers/scsi/ufs/ufshcd.c
>> > @@ -1807,19 +1807,19 @@ static ssize_t
>> > ufshcd_clkgate_enable_store(struct device *dev,
>> >  		return -EINVAL;
>> >
>> >  	value = !!value;
>> > +
>> > +	spin_lock_irqsave(hba->host->host_lock, flags);
>> >  	if (value == hba->clk_gating.is_enabled)
>> >  		goto out;
>> >
>> > -	if (value) {
>> > -		ufshcd_release(hba);
>> > -	} else {
>> > -		spin_lock_irqsave(hba->host->host_lock, flags);
>> > +	if (value)
>> > +		hba->clk_gating.active_reqs--;
>> > +	else
>> >  		hba->clk_gating.active_reqs++;
>> > -		spin_unlock_irqrestore(hba->host->host_lock, flags);
>> > -	}
>> >
>> >  	hba->clk_gating.is_enabled = value;
>> >  out:
>> > +	spin_unlock_irqrestore(hba->host->host_lock, flags);
>> >  	return count;
>> >  }