[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a9e79494-b94a-40f7-9c28-22b6220db5c2@lunn.ch>
Date: Mon, 22 Jan 2024 15:12:42 +0100
From: Andrew Lunn <andrew@...n.ch>
To: Rafał Miłecki <zajec5@...il.com>
Cc: Network Development <netdev@...r.kernel.org>,
Heiner Kallweit <hkallweit1@...il.com>,
Russell King <linux@...linux.org.uk>,
Robert Marko <robimarko@...il.com>,
Ansuel Smith <ansuelsmth@...il.com>,
Daniel Golle <daniel@...rotopia.org>
Subject: Re: Race in PHY subsystem? Attaching to PHY devices before they get
probed
On Mon, Jan 22, 2024 at 08:09:58AM +0100, Rafał Miłecki wrote:
> Hi!
>
> I have MT7988 SoC board with following problem:
> [ 26.887979] Aquantia AQR113C mdio-bus:08: aqr107_wait_reset_complete failed: -110
>
> This issue is known to occur when PHY's firmware is not running. After
> some debugging I discovered that .config_init() CB gets called while
> .probe() CB is still being executed.
>
> It turns out mtk_soc_eth.c calls phylink_of_phy_connect() before my PHY
> gets fully probed and it seems there is nothing in PHY subsystem
> verifying that. Please note this PHY takes quite some time to probe as
> it involves sending firmware to hardware.
>
> Is that a possible race in PHY subsystem?
Seems like it.
There is a patch "net: phylib: get rid of unnecessary locking" which
removed locks from probe, which might of helped, but the patch also
says:
The locking in phy_probe() and phy_remove() does very little to prevent
any races with e.g. phy_attach_direct(),
suggesting it probably did not help.
I think the traditional way problems like this are avoided is that the
device should not be visible to the rest of the system until probe has
completed.
Maybe look at phy_device_register(). I think it is the call to
mdiobus_register_device() which makes it visible. What i don't know is
the call path which calls probe. Is it part of device_add()? Can
mdiobus_register_device() be moved to the end of this function?
Could you add a WARN_ON(1) in the probe function to get the call
stack. With that, we might be able to figure out where the
mdiobus_register_device() needs to move.
Thanks
Andrew
Powered by blists - more mailing lists