[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YVW59e2iItl5S4Qh@shell.armlinux.org.uk>
Date: Thu, 30 Sep 2021 14:21:57 +0100
From: "Russell King (Oracle)" <linux@...linux.org.uk>
To: Rafał Miłecki <zajec5@...il.com>
Cc: Andrew Lunn <andrew@...n.ch>,
Heiner Kallweit <hkallweit1@...il.com>,
Network Development <netdev@...r.kernel.org>,
Florian Fainelli <f.fainelli@...il.com>,
BCM Kernel Feedback <bcm-kernel-feedback-list@...adcom.com>,
Vivek Unune <npcomplete13@...il.com>
Subject: Re: Lockup in phy_probe() for MDIO device (Broadcom's switch)
On Thu, Sep 30, 2021 at 02:07:44PM +0100, Russell King (Oracle) wrote:
> On Thu, Sep 30, 2021 at 02:51:40PM +0200, Rafał Miłecki wrote:
> > On 30.09.2021 14:30, Russell King (Oracle) wrote:
> > > > It's actually OpenWrt's downstream swconfig-based b53 driver that
> > > > matches this device.
> > > >
> > > > I'm confused as downstream b53_mdio.c calls phy_driver_register(). Why
> > > > does it match MDIO device then? I thought MDIO devices should be
> > > > matches only with drivers using mdio_driver_register().
> > >
> > > Note that I've no idea what he swconfig-based b53 driver looks like,
> > > I don't have the source for that to hand.
> > >
> > > If it calls phy_driver_register(), then it is registering a driver for
> > > a MDIO device wrapped in a struct phy_device. If this driver has a
> > > .of_match_table member set, then this is wrong - the basic rule is
> > >
> > > PHY drivers must never match using DT compatibles.
> > >
> > > because this is exactly what will occur - it bypasses the check that
> > > the mdio_device being matched is in fact wrapped by a struct phy_device,
> > > and we will access members of the non-existent phy_device, including
> > > the "uninitialised" mutex.
> > >
> > > If the swconfig-based b53 driver does want to bind to a phy_device based
> > > DT node, then it needs to match using either a custom .match_phy_device
> > > method in the PHY driver, or it needs to match using the PHY IDs.
> >
> > Sorry, I should have linked downstream b53_mdio.c . It's:
> > https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/generic/files/drivers/net/phy/b53/b53_mdio.c;h=98cdbffe73c7354f4401389dfcc96014bff62588;hb=HEAD
>
> Yes, I just found a version of the driver
>
> > You can see that is *uses* of_match_table.
> >
> > What about refusing bugged drivers like above b53 with something like:
>
> That will break all the MDIO based DSA and other non-PHY drivers,
> sorry.
>
> I suppose we could detect if the driver has the MDIO_DEVICE_IS_PHY flag
> set, and reject any device that does not have MDIO_DEVICE_IS_PHY set:
>
> diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
> index 53f034fc2ef7..7bc06126ce10 100644
> --- a/drivers/net/phy/mdio_bus.c
> +++ b/drivers/net/phy/mdio_bus.c
> @@ -939,6 +939,12 @@ EXPORT_SYMBOL_GPL(mdiobus_modify);
> static int mdio_bus_match(struct device *dev, struct device_driver *drv)
> {
> struct mdio_device *mdio = to_mdio_device(dev);
> + struct mdio_driver *mdiodrv = to_mdio_driver(drv);
> +
> + /* Both the driver and device must type-match */
> + if (!(mdiodrv->mdiodrv.flags & MDIO_DEVICE_IS_PHY) ==
> + !(mdio->flags & MDIO_DEVICE_FLAG_PHY))
> + return 0;
>
> if (of_driver_match_device(dev, drv))
> return 1;
>
> In other words, the driver's state of the MDIO_DEVICE_IS_PHY flag
> must match the device's MDIO_DEVICE_FLAG_PHY flag before we attempt
> any matches.
>
> If that's not possible, then we need to prevent phylib drivers from
> using .of_match_table:
>
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index e62f7a232307..dacf0b31b167 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -2501,6 +2501,16 @@ int phy_driver_register(struct phy_driver *new_driver, struct module *owner)
> return -EINVAL;
> }
>
> + /* PHYLIB device drivers must not match using a DT compatible table
> + * as this bypasses our checks that the mdiodev that is being matched
> + * is backed by a struct phy_device. If such a case happens, we will
> + * make out-of-bounds accesses and lockup in phydev->lock.
> + */
> + if (WARN(new_driver->mdiodrv.driver.of_match_table,
> + "%s: driver must not provide a DT match table\n",
> + new_driver->name))
> + return -EINVAL;
> +
> new_driver->mdiodrv.flags |= MDIO_DEVICE_IS_PHY;
> new_driver->mdiodrv.driver.name = new_driver->name;
> new_driver->mdiodrv.driver.bus = &mdio_bus_type;
I should also point out that as this b53 driver that is causing the
problem only exists in OpenWRT, this is really a matter for OpenWRT
developers rather than mainline which does not suffer this problem.
I suspect that OpenWRT developers will not be happy with either of
the two patches I've posted above - I suspect they are trying to
support both DSA and swconfig approaches with a single DT. That can
be made to work, but not with a PHYLIB driver being a wrapper around
the swconfig stuff (precisely because there's no phy_device in this
scenario.)
The only reason to patch mainline kernels would be to make them more
robust, and maybe to also make an explicit statement about what isn't
supported (having a phy_driver with its of_match_table member set.)
I probably should've made that clearer in my email with the patches.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Powered by blists - more mailing lists