lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <63bb653fe0a26f092426945a8af1aba5@codeaurora.org>
Date:   Tue, 27 Oct 2020 10:44:14 +0800
From:   Can Guo <cang@...eaurora.org>
To:     Jaegeuk Kim <jaegeuk@...nel.org>
Cc:     linux-kernel@...r.kernel.org, linux-scsi@...r.kernel.org,
        linux-f2fs-devel@...ts.sourceforge.net, kernel-team@...roid.com,
        alim.akhtar@...sung.com, avri.altman@....com, bvanassche@....org
Subject: Re: [PATCH v3 1/5] scsi: ufs: atomic update for clkgating_enable

On 2020-10-27 03:48, Jaegeuk Kim wrote:
> On 10/26, Can Guo wrote:
>> On 2020-10-26 14:13, Jaegeuk Kim wrote:
>> > On 10/26, Can Guo wrote:
>> > > On 2020-10-24 23:06, Jaegeuk Kim wrote:
>> > > > From: Jaegeuk Kim <jaegeuk@...gle.com>
>> > > >
>> > > > When giving a stress test which enables/disables clkgating, we hit
>> > > > device
>> > > > timeout sometimes. This patch avoids subtle racy condition to address
>> > > > it.
>> > > >
>> > > > If we use __ufshcd_release(), I've seen that gate_work can be called in
>> > > > parallel
>> > > > with ungate_work, which results in UFS timeout when doing hibern8.
>> > > > Should avoid it.
>> > > >
>> > >
>> > > I don't understand this comment. gate_work and ungate_work are
>> > > queued on
>> > > an ordered workqueue and an ordered workqueue executes at most one
>> > > work item
>> > > at any given time in the queued order. How can the two run in
>> > > parallel?
>> >
>> > When I hit UFS stuck, I saw this by clkgating tracepoint.
>> >
>> > - REQ_CLK_OFF
>> > - CLKS_OFF
>> > - REQ_CLK_OFF
>> > - REQ_CLKS_ON
>> > ..
>> >
>> 
>> I don't see how can you tell that the two works are running in 
>> parallel
>> just from above trace. May I know what is the exact error by "UFS 
>> timeout
>> when doing hibern8"?
>> 
>> By using __ufshcd_release() here, I do see one potential issue if your 
>> test
>> quickly toggles on/off of clk_gating - disable it, enable it, disable 
>> it and
>> enable it, which will cause that __ufshcd_release() being called 
>> twice,
>> meaning
>> we queue two gate_works back to back. So can you try below code and 
>> let me
>> know
>> if it helps or not? I am OK with your current change, but I would like 
>> to
>> understand the problem. Thanks.
>> 
>> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
>> index 1791bce..3eee438 100644
>> --- a/drivers/scsi/ufs/ufshcd.c
>> +++ b/drivers/scsi/ufs/ufshcd.c
>> @@ -2271,6 +2271,8 @@ static void ufshcd_gate_work(struct work_struct 
>> *work)
>>         unsigned long flags;
>> 
>>         spin_lock_irqsave(hba->host->host_lock, flags);
>> +       if (hba->clk_gating.state == CLKS_OFF)
>> +               goto rel_lock;
>>         /*
>>          * In case you are here to cancel this work the gating state
>>          * would be marked as REQ_CLKS_ON. In this case save time by
> 
> This doesn't help. So, I checked this back again, and, like what you 
> said, now
> suspect __ufshcd_release() which changed state to REQ_CLKS_OFF on 
> CLKS_OFF.
> 

Aha, sorry that I gave the right analysis but wrong fix - my check won't 
help
since it is checking CLKS_OFF, but at that moment it has become 
CLKS_REQ_OFF.
Your fix is fulfiling the right purpose.

Thanks,

Can Guo.


> With the below change, I can see the issue anymore. Let me send v4.
> 
> ---
>  drivers/scsi/ufs/ufshcd.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index b8f573a02713..cc8d5f0c3fdc 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -1745,7 +1745,8 @@ static void __ufshcd_release(struct ufs_hba *hba)
>  	if (hba->clk_gating.active_reqs || hba->clk_gating.is_suspended ||
>  	    hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL ||
>  	    ufshcd_any_tag_in_use(hba) || hba->outstanding_tasks ||
> -	    hba->active_uic_cmd || hba->uic_async_done)
> +	    hba->active_uic_cmd || hba->uic_async_done ||
> +	    hba->clk_gating.state == CLKS_OFF)
>  		return;
> 
>  	hba->clk_gating.state = REQ_CLKS_OFF;
> --
> 2.29.0.rc1.297.gfa9743e501-goog
> 
> 
>> 
>> Regards,
>> 
>> Can Guo.
>> 
>> > By using active_req, I don't see any problem.
>> >
>> > >
>> > > Thanks,
>> > >
>> > > Can Guo.
>> > >
>> > > > Signed-off-by: Jaegeuk Kim <jaegeuk@...gle.com>
>> > > > ---
>> > > >  drivers/scsi/ufs/ufshcd.c | 12 ++++++------
>> > > >  1 file changed, 6 insertions(+), 6 deletions(-)
>> > > >
>> > > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
>> > > > index b8f573a02713..e0b479f9eb8a 100644
>> > > > --- a/drivers/scsi/ufs/ufshcd.c
>> > > > +++ b/drivers/scsi/ufs/ufshcd.c
>> > > > @@ -1807,19 +1807,19 @@ static ssize_t
>> > > > ufshcd_clkgate_enable_store(struct device *dev,
>> > > >  		return -EINVAL;
>> > > >
>> > > >  	value = !!value;
>> > > > +
>> > > > +	spin_lock_irqsave(hba->host->host_lock, flags);
>> > > >  	if (value == hba->clk_gating.is_enabled)
>> > > >  		goto out;
>> > > >
>> > > > -	if (value) {
>> > > > -		ufshcd_release(hba);
>> > > > -	} else {
>> > > > -		spin_lock_irqsave(hba->host->host_lock, flags);
>> > > > +	if (value)
>> > > > +		hba->clk_gating.active_reqs--;
>> > > > +	else
>> > > >  		hba->clk_gating.active_reqs++;
>> > > > -		spin_unlock_irqrestore(hba->host->host_lock, flags);
>> > > > -	}
>> > > >
>> > > >  	hba->clk_gating.is_enabled = value;
>> > > >  out:
>> > > > +	spin_unlock_irqrestore(hba->host->host_lock, flags);
>> > > >  	return count;
>> > > >  }

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ