[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 5 May 2022 19:41:00 +0200
From: Andrew Lunn <andrew@...n.ch>
To: Francesco Dolcini <francesco.dolcini@...adex.com>
Cc: Joakim Zhang <qiangqing.zhang@....com>, netdev@...r.kernel.org,
Andy Duan <fugang.duan@....com>,
Heiner Kallweit <hkallweit1@...il.com>,
Russell King <linux@...linux.org.uk>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
"David S. Miller" <davem@...emloft.net>,
Fabio Estevam <festevam@...il.com>,
Tim Harvey <tharvey@...eworks.com>,
Chris Healy <cphealy@...il.com>
Subject: Re: FEC MDIO read timeout on linkup
On Thu, May 05, 2022 at 10:29:01AM +0200, Francesco Dolcini wrote:
> Hello Andrew and all, I believe I finally found the problem and I'm
> preparing a patch for it.
>
> On Wed, May 04, 2022 at 12:17:59AM +0200, Andrew Lunn wrote:
> > > I'm wondering could this be related to
> > > fec_enet_adjust_link()->fec_restart() during a fec_enet_mdio_read()
> > > and one of the many register write in fec_restart() just creates the
> > > issue, maybe while resetting the FEC? Does this makes any sense?
> >
> > phylib is 'single threaded', in that only one thing will be active at
> > once for a PHY. While fec_enet_adjust_link() is being called, there
> > will not be any read/writes occurring for that PHY.
>
> I think this is not the whole story here. We can have a phy interrupt
> handler that runs in its own context and it could be doing a MDIO
> transaction, and this is exactly my case.
>
> Thread 1 (phylib WQ) | Thread 2 (phy interrupt)
> |
> | phy_interrupt() <-- PHY IRQ
> | handle_interrupt()
> | phy_read()
> | phy_trigger_machine()
> | --> schedule WQ
> |
> |
> phy_state_machine() |
> phy_check_link_status() |
> phy_link_change() |
> phydev->adjust_link() |
> fec_enet_adjust_link() |
> --> FEC reset | phy_interrupt() <-- PHY IRQ
> | phy_read()
> |
>
> To confirm this I have added a spinlock to detect this race condition
> with just a trylock and a WARN_ON(1) when the locking is failing. On
> "MDIO read timeout" acquiring the spinlock fails.
>
> This is also in agreement with the fact that polling the PHY instead of
> having the interrupt is working just fine.
Yes, that makes sense.
But i would fix this differently. The interrupt handler runs in a
threaded interrupt. So it can use mutex. So it should actually take
the phy mutex.
Please try this:
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index beb2b66da132..7d3a64d04820 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -970,8 +970,13 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
{
struct phy_device *phydev = phy_dat;
struct phy_driver *drv = phydev->drv;
+ int ret;
- return drv->handle_interrupt(phydev);
+ mutex_lock(&phydev->lock);
+ ret = drv->handle_interrupt(phydev);
+ mutex_unlock(&phydev->lock);
+
+ return ret;
}
That will stop it running in parallel to the adjust_link callback, or
anything else in phylib.
Andrew
Powered by blists - more mailing lists