[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZwW8-Xi8sStL50uw@makrotopia.org>
Date: Wed, 9 Oct 2024 00:15:05 +0100
From: Daniel Golle <daniel@...rotopia.org>
To: "Russell King (Oracle)" <linux@...linux.org.uk>
Cc: Andrew Lunn <andrew@...n.ch>, Heiner Kallweit <hkallweit1@...il.com>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next] net: phy: realtek: check validity of 10GbE
link-partner advertisement
Hi Russell,
On Tue, Oct 08, 2024 at 03:27:21PM +0100, Russell King (Oracle) wrote:
> Okay, I think the problem is down to the order in which Realtek is
> doing stuff.
> [...]
> Now, rtl822x_read_status() reads the 10G status, modifying
> phydev->lp_advertising before then going on to call
> rtlgen_read_status(), which then calls genphy_read_status(), which
> in turn will then call genphy_read_lpa().
>
> First, this is the wrong way around. Realtek needs to call
> genphy_read_status() so that phydev->link and phydev->autoneg_complete
> are both updated to the current status.
First of all thanks a lot for diving down that rabbit hole with me!
>
> Then, it needs to check whether AN is enabled, and whether autoneg
> has completed and deal with both situations.
>
> Afterwards, it then *possibly* needs to read its speed register and
> decode that to phydev->speed, but I don't see the point of that when
> it's (a) not able to also decode the duplex from that register, and
> (b) when we've already resolved it ourselves from the link mode.
> What I'd be worried about is if the PHY does a down-shift to a
> different speed _and_ duplex from what was resolved - and thus
> whether we should even be enabling downshift on this PHY. Maybe
> there's a bit in 0xa43 0x12 that gives us the duplex as well?
>
> In other words:
>
> static int rtl822x_read_status(struct phy_device *phydev)
> {
> int lpadv, ret;
>
> ret = rtlgen_read_status(phydev);
> if (ret < 0)
> return ret;
>
> if (phydev->autoneg == AUTONEG_DISABLE)
> return 0;
>
> if (!phydev->autoneg_complete) {
> mii_10gbt_stat_mod_linkmode_lpa_t(phydev->lp_advertising, 0);
> return 0;
> }
>
> lpadv = phy_read_paged(phydev, 0xa5d, 0x13);
> if (lpadv < 0)
> return lpadv;
>
> mii_10gbt_stat_mod_linkmode_lpa_t(phydev->lp_advertising, lpadv);
> phy_resolve_aneg_linkmode(phydev);
>
> return 0;
> }
>
> That should at least get proper behaviour in the link partner
> advertising bitmap rather than the weirdness that Realtek is doing.
> (BTW, other drivers should be audited for the same bug!)
Got it, always do genphy_read_status() first thing, as that will
clear things and set autoneg_complete.
Similarly, when dealing with the same PHY in C45 mode, I noticed that
phy->autoneg_complete never gets set, but rather we have to check it
via genphy_c45_aneg_done(phydev) and clear bits set by
mii_stat1000_mod_linkmode_lpa_t().
Doing so for C45 access, and following your suggestion above for C22
resolves the issue without any need to check MDIO_AN_10GBT_STAT_LOCOK
or MDIO_AN_10GBT_STAT_REMOK.
> [...]
> However, if we keep the rtlgen_decode_speed() stuff, and can fix the
> duplex issue, then the phy_resolve_aneg_linkmode() calls should not
> be necessary, and it should be moved _after_ this to ensure that
> phydev->speed (and phydev->duplex) are correctly set.
PHY Specific Status Register, MMD 31.0xA434 also carries duplex
information in bit 3 as well as more useful information.
Probably rtlgen_decode_speed() should be renamed to rtlgen_decode_physr()
and decode most of that.
I'll post a series taking care of all of that shortly.
Again, thanks a lot for the extremely insightful lesson!
Cheers
Daniel
Powered by blists - more mailing lists