[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180206110012.GJ9418@n2100.armlinux.org.uk>
Date: Tue, 6 Feb 2018 11:00:13 +0000
From: Russell King - ARM Linux <linux@...linux.org.uk>
To: Heiner Kallweit <hkallweit1@...il.com>
Cc: Florian Fainelli <f.fainelli@...il.com>,
Andrew Lunn <andrew@...n.ch>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: Potential issue with f5e64032a799 "net: phy: fix resume handling"
On Mon, Feb 05, 2018 at 10:48:55PM +0100, Heiner Kallweit wrote:
> Am 04.02.2018 um 03:48 schrieb Florian Fainelli:
> >
> >
> > On 02/03/2018 03:58 PM, Heiner Kallweit wrote:
> >> Am 03.02.2018 um 21:17 schrieb Andrew Lunn:
> >>> On Sat, Feb 03, 2018 at 05:41:54PM +0100, Heiner Kallweit wrote:
> >>>> This commit forces callers of phy_resume() and phy_suspend() to hold
> >>>> mutex phydev->lock. This was done for calls to phy_resume() and
> >>>> phy_suspend() in phylib, however there are more callers in network
> >>>> drivers. I'd assume that these other calls issue a warning now
> >>>> because of the lock not being held.
> >>>> So is there something I miss or would this have to be fixed?
> >>>
> >>> Hi Heiner
> >>>
> >>> This is a good point.
> >>>
> >>> Yes, it looks like some fixes are needed. But what exactly?
> >>>
> >>> The phy state machine will suspend and resume the phy is you call
> >>> phy_stop() and phy_start() in the MAC suspend and resume functions.
> >>>
> >> AFAICS phy_stop() doesn't suspend the PHY. It just sets the state
> >> to PHY_HALTED and (at least if we're not in polling mode) doesn't
> >> call phy_suspend(). Maybe a call to phy_trigger_machine() is
> >> needed like in phy_start() ? Then the state machine would call
> >> phy_suspend(), provided the link is still up.
> >
> > Right, phy_stop() merely just moves the state machine to PHY_HALTED and
> > this is actually a great source of problems which I tried to address here:
> >
> > https://www.mail-archive.com/netdev@vger.kernel.org/msg196061.html
> >
> > because phy_stop() is not a synchronous call, so when it returns the
> > state machine might still be running (it can take up to a 1 HZ depending
> > on when you called phy_stop()) and so if you took that as a
> > synchronization point to e.g: turn of your Ethernet MAC/MDIO bus clocks,
> > you will likely see problems. phy_stop_machine() would provide that
> > synchronization point, but is not currently exported, despite being a
> > global symbol. This patch series above is all well and good, except that
> > Geert reported issues with suspend/resume interactions which I have not
> > been able to track down.
> >
> > We should most definitively try to consolidate the different PHY
> > suspend/resume within the Ethernet MAC suspend/resume implementation and
> > document exactly what the appropriate behavior must be under the
> > following circumstances:
> >
> > - when to call phy_stop() + phy_stop_machine()
> > - when to call phy_suspend() (if the network interface does do not WoL)
> > - when to call phy_resume() (if needed, actually, it usually is not)
> > - when to call phy_start()
> >
>
> I think phy_start() / phy_start_machine() / phy_start_interrupts()
> belong together and we may call the latter two functions from phy_start().
> Same for stop.
>
> This would mean:
> - Remove call to phy_start_interrupts() from phy_connect_direct()
> - Call phy_start_machine() and phy_start_interrupts() from phy_start()
> - mdio_bus_phy_suspend() calls phy_stop()
> Same for stop, plus: phy_error() calls phy_stop().
>
> In this setup a second call to phy_stop() wouldn't hurt because state
> is PHY_HALTED already and phy_stop() is a no-op.
>
> A functional change would be that interrupts are disabled during system
> suspend (except WoL because we don't suspend the PHY is this case).
>
> These are first thoughts and therefore it's fine if you totally disagree ..
> I didn't test this yet, it's only a "Gedankenexperiment" so far.
>
> When talking about suspend/resume I think we talked about system suspend /
> resume. However I think we need to consider also runtime pm.
> If a link is down the network driver may decide to runtime-suspend the PHY
> (power it down). In case of runtime pm I'd say we need to keep irq and
> workqueue active to be able to react if a cable is plugged in and the PHY
> wakes up automatically and establishes a link.
Maybe a better solution now would be to restore phy_resume()'s lock-
taking behaviour, and provide a lockless __phy_resume() which can be
used internally within phylib. This means drivers using phy_resume()
would see no change. Maybe something like (untested):
drivers/net/phy/phy.c | 2 +-
drivers/net/phy/phy_device.c | 17 ++++++++++++++---
include/linux/phy.h | 1 +
3 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index f3313a129531..4574d02dce93 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -819,7 +819,7 @@ void phy_start(struct phy_device *phydev)
break;
case PHY_HALTED:
/* if phy was suspended, bring the physical link up again */
- phy_resume(phydev);
+ __phy_resume(phydev);
/* make sure interrupts are re-enabled for the PHY */
if (phydev->irq != PHY_POLL) {
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index b13eed21c87d..ae0f9306bbdc 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -136,7 +136,7 @@ static int mdio_bus_phy_resume(struct device *dev)
goto no_resume;
mutex_lock(&phydev->lock);
- ret = phy_resume(phydev);
+ ret = __phy_resume(phydev);
mutex_unlock(&phydev->lock);
if (ret < 0)
return ret;
@@ -1042,7 +1042,7 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
goto error;
mutex_lock(&phydev->lock);
- phy_resume(phydev);
+ __phy_resume(phydev);
mutex_unlock(&phydev->lock);
phy_led_triggers_register(phydev);
@@ -1172,7 +1172,7 @@ int phy_suspend(struct phy_device *phydev)
}
EXPORT_SYMBOL(phy_suspend);
-int phy_resume(struct phy_device *phydev)
+int __phy_resume(struct phy_device *phydev)
{
struct phy_driver *phydrv = to_phy_driver(phydev->mdio.dev.driver);
int ret = 0;
@@ -1189,6 +1189,17 @@ int phy_resume(struct phy_device *phydev)
return ret;
}
+
+int phy_resume(struct phy_device *phydev)
+{
+ int ret;
+
+ mutex_lock(&phydev->lock);
+ ret = __phy-resume(phydev);
+ mutex_unlock(&phydev->lock);
+
+ return ret;
+}
EXPORT_SYMBOL(phy_resume);
int phy_loopback(struct phy_device *phydev, bool enable)
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 5a0c3e53e7c2..8f82bd64f82d 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -923,6 +923,7 @@ static inline void phy_device_free(struct phy_device *phydev) { }
void phy_device_remove(struct phy_device *phydev);
int phy_init_hw(struct phy_device *phydev);
int phy_suspend(struct phy_device *phydev);
+int __phy_resume(struct phy_device *phydev);
int phy_resume(struct phy_device *phydev);
int phy_loopback(struct phy_device *phydev, bool enable);
struct phy_device *phy_attach(struct net_device *dev, const char *bus_id,
--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up
Powered by blists - more mailing lists