[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Thu, 5 May 2022 19:54:00 +0200
From: Francesco Dolcini <francesco.dolcini@...adex.com>
To: Andrew Lunn <andrew@...n.ch>
Cc: Francesco Dolcini <francesco.dolcini@...adex.com>,
Joakim Zhang <qiangqing.zhang@....com>, netdev@...r.kernel.org,
Andy Duan <fugang.duan@....com>,
Heiner Kallweit <hkallweit1@...il.com>,
Russell King <linux@...linux.org.uk>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
"David S. Miller" <davem@...emloft.net>,
Fabio Estevam <festevam@...il.com>,
Tim Harvey <tharvey@...eworks.com>,
Chris Healy <cphealy@...il.com>
Subject: Re: FEC MDIO read timeout on linkup
On Thu, May 05, 2022 at 07:41:00PM +0200, Andrew Lunn wrote:
> On Thu, May 05, 2022 at 10:29:01AM +0200, Francesco Dolcini wrote:
> > Hello Andrew and all, I believe I finally found the problem and I'm
> > preparing a patch for it.
> >
> > On Wed, May 04, 2022 at 12:17:59AM +0200, Andrew Lunn wrote:
> > > > I'm wondering could this be related to
> > > > fec_enet_adjust_link()->fec_restart() during a fec_enet_mdio_read()
> > > > and one of the many register write in fec_restart() just creates the
> > > > issue, maybe while resetting the FEC? Does this makes any sense?
> > >
> > > phylib is 'single threaded', in that only one thing will be active at
> > > once for a PHY. While fec_enet_adjust_link() is being called, there
> > > will not be any read/writes occurring for that PHY.
> >
> > I think this is not the whole story here. We can have a phy interrupt
> > handler that runs in its own context and it could be doing a MDIO
> > transaction, and this is exactly my case.
> >
> > Thread 1 (phylib WQ) | Thread 2 (phy interrupt)
> > |
> > | phy_interrupt() <-- PHY IRQ
> > | handle_interrupt()
> > | phy_read()
> > | phy_trigger_machine()
> > | --> schedule WQ
> > |
> > |
> > phy_state_machine() |
> > phy_check_link_status() |
> > phy_link_change() |
> > phydev->adjust_link() |
> > fec_enet_adjust_link() |
> > --> FEC reset | phy_interrupt() <-- PHY IRQ
> > | phy_read()
> > |
> >
> > To confirm this I have added a spinlock to detect this race condition
> > with just a trylock and a WARN_ON(1) when the locking is failing. On
> > "MDIO read timeout" acquiring the spinlock fails.
> >
> > This is also in agreement with the fact that polling the PHY instead of
> > having the interrupt is working just fine.
>
> Yes, that makes sense.
>
> But i would fix this differently. The interrupt handler runs in a
> threaded interrupt. So it can use mutex. So it should actually take
> the phy mutex.
I was just about to send a patch with phy_lock_mdio_bus() in
fec_enet_adjust_link(), anyway, I'll send the version you proposed in a
bit.
Thanks,
Francesco
Powered by blists - more mailing lists