[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<DM3PR11MB8736D3C4814D9DCC8022F803EC242@DM3PR11MB8736.namprd11.prod.outlook.com>
Date: Fri, 15 Nov 2024 01:53:57 +0000
From: <Tristram.Ha@...rochip.com>
To: <olteanv@...il.com>
CC: <andrew@...n.ch>, <Woojung.Huh@...rochip.com>, <robh@...nel.org>,
<krzk+dt@...nel.org>, <conor+dt@...nel.org>, <davem@...emloft.net>,
<edumazet@...gle.com>, <kuba@...nel.org>, <pabeni@...hat.com>,
<marex@...x.de>, <UNGLinuxDriver@...rochip.com>,
<devicetree@...r.kernel.org>, <netdev@...r.kernel.org>,
<linux-kernel@...r.kernel.org>
Subject: RE: [PATCH net-next 2/2] net: dsa: microchip: Add SGMII port support
to KSZ9477 switch
> On Wed, Nov 13, 2024 at 02:12:36AM +0000, Tristram.Ha@...rochip.com wrote:
> > When the SFP says it supports 1000Base-T sfp_add_phy() is called by the
> > SFP state machine and phylink_sfp_connect_phy() and
> > phylink_sfp_config_phy() are run. It is in the last function that the
> > validation fails as the just created phy device does not initialize its
> > supported and advertising fields yet. The phy device has the
> > opportunity later to fill them up if the phylink creation goes through,
> > but that never happens.
> >
> > A fix is to fill those fields with sfp_support like this:
> >
> > @@ -3228,6 +3228,11 @@ static int phylink_sfp_config_phy(struct
> > struct phylink_link_state config;
> > int ret;
> >
> > + /* The newly created PHY device has empty settings. */
> > + if (linkmode_empty(phy->supported)) {
> > + linkmode_copy(phy->supported, pl->sfp_support);
> > + linkmode_copy(phy->advertising, pl->sfp_support);
> > + }
> > linkmode_copy(support, phy->supported);
> >
> > memset(&config, 0, sizeof(config));
> >
> > The provided PCS driver from the DSA driver has an opportunity to change
> > support with its validation check, but that does not look right as
> > generally those checks remove certain bits from the link mode, but this
> > requires completely copying new ones. And this still does not work as
> > the advertising field passed to the PCS driver has a const modifier.
>
> I think I know what's happening, it's unfortunate it pushed you towards
> wrong conclusions.
>
> The "fix" you posted is wrong, and no, the PCS driver should not expand
> the supported mask, just restrict it as you said. The phydev->supported
> mask normally comes from the phy_probe() logic:
>
> /* Start out supporting everything. Eventually,
> * a controller will attach, and may modify one
> * or both of these values
> */
> if (phydrv->features) {
> linkmode_copy(phydev->supported, phydrv->features);
> genphy_c45_read_eee_abilities(phydev);
> }
> else if (phydrv->get_features)
> err = phydrv->get_features(phydev);
> else if (phydev->is_c45)
> err = genphy_c45_pma_read_abilities(phydev);
> else
> err = genphy_read_abilities(phydev);
>
> The SFP bus code depends strictly on sfp_sm_probe_phy() -> phy_device_register()
> actually loading a precise device driver for the PHY synchronously via
> phy_bus_match(). There is another lazy loading mechanism later in
> phy_attach_direct(), for the Generic PHY driver:
>
> /* Assume that if there is no driver, that it doesn't
> * exist, and we should use the genphy driver.
> */
>
> but that is too late for this code path, because as you say,
> phylink_sfp_config_phy() is coded up to only call phylink_attach_phy()
> if phylink_validate() succeeds. But phylink_validate() will only see a
> valid phydev->supported mask with the Generic PHY driver if we let that
> driver attach in phylink_attach_phy() in the first place.
>
> Personally, I think SFP modules with embedded PHYs strictly require the
> matching driver to be available to the kernel, due to that odd way in
> which the Generic PHY driver is loaded, but I will let the PHY library
> experts share their opinion as well.
>
> You would be better off improving the error message, see what PHY ID you
> get, then find and load the driver for it:
>
> diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
> index 7dbcbf0a4ee2..8be473a7d262 100644
> --- a/drivers/net/phy/sfp.c
> +++ b/drivers/net/phy/sfp.c
> @@ -1817,9 +1817,12 @@ static int sfp_sm_probe_phy(struct sfp *sfp, int addr, bool
> is_c45)
>
> err = sfp_add_phy(sfp->sfp_bus, phy);
> if (err) {
> + dev_err(sfp->dev,
> + "sfp_add_phy() for PHY %s (ID 0x%.8lx) failed: %pe, maybe PHY driver
> not loaded?\n",
> + phydev_name(phy), (unsigned long)phy->phy_id,
> + ERR_PTR(err));
> phy_device_remove(phy);
> phy_device_free(phy);
> - dev_err(sfp->dev, "sfp_add_phy failed: %pe\n", ERR_PTR(err));
> return err;
> }
>
>
> Chances are it's one of CONFIG_MARVELL_PHY or CONFIG_AQUANTIA_PHY.
Indeed adding the Marvell PHY driver fixed the problem.
There is nothing special about the Marvell PHY driver. It is just
phy_probe() is called during PHY device creation just as you said.
It may not be right to use sfp_support, but all three (sfp_support,
supported, advertising) have about the same value at that point after the
PHY driver is invoked: 0x62ff and 0x60f0.
I mentioned before that some SFPs have faulty implementation where part
of the returned PHY register value is 0xff. For example, PHY register 0
is 0x11ff, PHY register 1 is 0x79ff, and PHY register 2 is 0x01ff. The
Marvell PHY id is 0x01410cc0, and I saw there is a special PHY id
0x01ff0cc0 defined for Finisar to accommodate this situation. Were those
SFPs made by Finisar originally?
Some of those PHY registers are correct, so I do not know if those wrong
registers are intentional or not, but the link status register always
has 0xff value and that cannot be right.
Powered by blists - more mailing lists