[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a4eeb46f-2df1-16a6-b0e4-c6ea7683b75f@linux.intel.com>
Date: Tue, 22 Mar 2022 14:31:44 +0200 (EET)
From: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
To: "Martinez, Ricardo" <ricardo.martinez@...ux.intel.com>
cc: Netdev <netdev@...r.kernel.org>, linux-wireless@...r.kernel.org,
kuba@...nel.org, davem@...emloft.net, johannes@...solutions.net,
ryazanov.s.a@...il.com, loic.poulain@...aro.org,
m.chetan.kumar@...el.com, chandrashekar.devegowda@...el.com,
linuxwwan@...el.com, chiranjeevi.rapolu@...ux.intel.com,
haijun.liu@...iatek.com, amir.hanania@...el.com,
Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
dinesh.sharma@...el.com, eliot.lee@...el.com,
moises.veleta@...el.com, pierre-louis.bossart@...el.com,
muralidharan.sethuraman@...el.com, Soumya.Prakash.Mishra@...el.com,
sreehari.kancharla@...el.com, madhusmita.sahu@...el.com
Subject: Re: [PATCH net-next v5 12/13] net: wwan: t7xx: Device deep sleep
lock/unlock
On Fri, 18 Mar 2022, Martinez, Ricardo wrote:
>
> On 3/10/2022 2:21 AM, Ilpo Järvinen wrote:
> > On Wed, 23 Feb 2022, Ricardo Martinez wrote:
> >
> > > From: Haijun Liu <haijun.liu@...iatek.com>
> > >
> > > Introduce the mechanism to lock/unlock the device 'deep sleep' mode.
> > > When the PCIe link state is L1.2 or L2, the host side still can keep
> > > the device is in D0 state from the host side point of view. At the same
> > > time, if the device's 'deep sleep' mode is unlocked, the device will
> > > go to 'deep sleep' while it is still in D0 state on the host side.
> > >
> > > Signed-off-by: Haijun Liu <haijun.liu@...iatek.com>
> > > Signed-off-by: Chandrashekar Devegowda <chandrashekar.devegowda@...el.com>
> > > Co-developed-by: Ricardo Martinez <ricardo.martinez@...ux.intel.com>
> > > Signed-off-by: Ricardo Martinez <ricardo.martinez@...ux.intel.com>
> > > ---
> ...
> > > +int t7xx_pci_sleep_disable_complete(struct t7xx_pci_dev *t7xx_dev)
> > > +{
> > > + struct device *dev = &t7xx_dev->pdev->dev;
> > > + int ret;
> > > +
> > > + ret = wait_for_completion_timeout(&t7xx_dev->sleep_lock_acquire,
> > > +
> > > msecs_to_jiffies(PM_SLEEP_DIS_TIMEOUT_MS));
> > > + if (!ret)
> > > + dev_err_ratelimited(dev, "Resource wait complete timed
> > > out\n");
> > > +
> > > + return ret;
> > > +}
> > > +
> > > +/**
> > > + * t7xx_pci_disable_sleep() - Disable deep sleep capability.
> > > + * @t7xx_dev: MTK device.
> > > + *
> > > + * Lock the deep sleep capability, note that the device can still go into
> > > deep sleep
> > > + * state while device is in D0 state, from the host's point-of-view.
> > > + *
> > > + * If device is in deep sleep state, wake up the device and disable deep
> > > sleep capability.
> > > + */
> > > +void t7xx_pci_disable_sleep(struct t7xx_pci_dev *t7xx_dev)
> > > +{
> > > + unsigned long flags;
> > > +
> > > + if (atomic_read(&t7xx_dev->md_pm_state) < MTK_PM_RESUMED) {
> > > + atomic_inc(&t7xx_dev->sleep_disable_count);
> > > + complete_all(&t7xx_dev->sleep_lock_acquire);
> > > + return;
> > > + }
> > > +
> > > + spin_lock_irqsave(&t7xx_dev->md_pm_lock, flags);
> > > + if (atomic_inc_return(&t7xx_dev->sleep_disable_count) == 1) {
> > > + u32 deep_sleep_enabled;
> > > +
> > > + reinit_completion(&t7xx_dev->sleep_lock_acquire);
> > You might want to check that there's a mechanism that prevents this
> > racing with wait_for_completion_timeout() in
> > t7xx_pci_sleep_disable_complete().
> >
> > I couldn't prove it myself but there are probably aspect in the PM side of
> > things I wasn't able to take fully into account (that is, which call
> > paths are not possible to occur).
> Those functions are called in the following order:
> 1.- t7xx_pci_disable_sleep()
> 2.- t7xx_pci_sleep_disable_complete()
> 3.- t7xx_pci_enable_sleep()
That sequence gets called from 5 places:
- t7xx_cldma_send_skb
- t7xx_dpmaif_rxq_work
- t7xx_dpmaif_bat_release_work
- t7xx_dpmaif_tx_done
- t7xx_dpmaif_tx_hw_push_thread + t7xx_do_tx_hw_push
Which of those can run parallel to each other, I'm not sure of. But they
can, the race is likely there between those "instances" of the sequence,
one instance doing reinit_completion() and the other
wait_for_completion_timeout().
> That sequence and md_pm_lock protect against a race condition between
> wait_for_completion_timeout() and reinit_completion().
wait_for_completion_timeout() is not protected by md_pm_lock. There is
a path with return in t7xx_pci_disable_sleep() before taking md_pm_lock.
> On the other hand, there could be a race condition between
> t7xx_pci_disable_sleep() and t7xx_pci_enable_sleep() which may cause sleep to
> get enabled while one thread expects it to be disabled.
...And once sleep gets enabled, this can get true, no?
if (atomic_read(&t7xx_dev->md_pm_state) < MTK_PM_RESUMED) {
...after which there's nothing which protects
wait_for_completion_timeout() from racing with another instance of
the sequence which has not yet executed reinit_completion()?
I think you found the very race which I was worried about. :-)
> The fix would be to protect sleep_disable_count with md_pm_lock, then
> sleep_disable_count doesn't need to be declared as atomic.
> The next version will include cleanup in this area.
Ok. I'll take a look once you post the next version.
--
i.
Powered by blists - more mailing lists