[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <47960d0c-189b-28c8-8337-ee1b202ad144@seco.com>
Date: Tue, 22 Nov 2022 11:45:18 -0500
From: Sean Anderson <sean.anderson@...o.com>
To: Vladimir Oltean <vladimir.oltean@....com>
Cc: netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Heiner Kallweit <hkallweit1@...il.com>,
Andrew Lunn <andrew@...n.ch>,
Russell King <linux@...linux.org.uk>,
Florian Fainelli <f.fainelli@...il.com>,
UNGLinuxDriver@...rochip.com,
bcm-kernel-feedback-list@...adcom.com,
Madalin Bucur <madalin.bucur@....nxp.com>,
Camelia Groza <camelia.groza@....com>,
Claudiu Manoil <claudiu.manoil@....com>,
Ioana Ciornei <ioana.ciornei@....com>,
Maxim Kochetkov <fido_max@...ox.ru>,
Antoine Tenart <atenart@...nel.org>,
Michael Walle <michael@...le.cc>,
Raag Jadav <raagjadav@...il.com>,
Siddharth Vadapalli <s-vadapalli@...com>,
Ong Boon Leong <boon.leong.ong@...el.com>,
Colin Foster <colin.foster@...advantage.com>,
Marek Behun <marek.behun@....cz>
Subject: Re: [PATCH v4 net-next 0/8] Let phylink manage in-band AN for the PHY
On 11/22/22 11:30, Vladimir Oltean wrote:
> On Tue, Nov 22, 2022 at 11:10:03AM -0500, Sean Anderson wrote:
>> On 11/21/22 19:17, Vladimir Oltean wrote:
>> > On Mon, Nov 21, 2022 at 05:42:44PM -0500, Sean Anderson wrote:
>> >> Are you certain this is the cause of the issue? It's also possible that
>> >> there is some errata for the PCS which is causing the issue. I have
>> >> gotten no review/feedback from NXP regarding the phylink conversion
>> >> (aside from acks for the cleanups).
>> >
>> > Erratum which does what out of the ordinary? Your description of the
>> > hardware failure seems consistent with the most plausible explanation
>> > that doesn't involve any bugs.
>>
>> Well, I don't have a setup which doesn't require in-band AN, so I can't
>> say one way or the other where the problem lies. To me, the Lynx PCS is
>> just as opaque as the phy.
>
> ok.
>
>> > If you enable C37/SGMII AN in the PCS (of the PHY or of the MAC) and AN
>> > does not complete (because it's not enabled on the other end), that
>> > system side of the link remains down. Which you don't see when you
>> > operate in MLO_AN_PHY mode, because phylink only considers the PCS link
>> > state in MLO_AN_INBAND mode. So this is why you see the link as up but
>> > it doesn't work.
>>
>> Actually, I checked the PCS manually in phy mode, and the link was up.
>> I expected it to be down, so this was a bit surprising to me.
>
> Well, if autoneg is disabled in the Lynx PCS (which it is in MLO_AN_PHY),
> then the link should come up right away, as long as it can lock on some
> symbols IIRC. It's a different story for the PHY PCS if autoneg is
> enabled there. Still nothing surprising here, really.
>
>> > To confirm whether I'm right or wrong, there's a separate SERDES
>> > Interrupt Status Register at page 0xde1 offset 0x12, whose bit 4 is
>> > "SERDES link status change" and bit 0 is "SERDES auto-negotiation error".
>> > These bits should both be set when you double-read them (regardless of
>> > IRQ enable I think) when your link is down with MLO_AN_PHY, but should
>> > be cleared with MLO_AN_INBAND.
>>
>> This register is always 0s for me...
>>
>> >> This is used for SGMII to RGMII bridge mode (figure 4). It doesn't seem
>> >> to contain useful information for UTP mode (figure 1).
>> >
>> > So it would seem. It was a hasty read last time, sorry. Re-reading, the
>> > field says that when it's set, the SGMII code word being transmitted is
>> > "selected by the register" SGMII ANAR. And in the SGMII ANLPAR, you can
>> > see what the MAC said.
>>
>> ... possibly because of this.
>>
>> That said, ANLPAR is 0x4001 (all reserved bits) when we use in-band:
>>
>> [ 8.191146] RTL8211F Gigabit Ethernet 0x0000000001afc000:04: INER=0000 INSR=0000 ANARSEL=0000 ANAR=0050 ANLPAR=4001
>>
>> but all zeros without:
>>
>> [ 11.263245] RTL8211F Gigabit Ethernet 0x0000000001afc000:04: INER=0000 INSR=0000 ANARSEL=0000 ANAR=0050 ANLPAR=0000
>
> So enabling in-band autoneg in the Lynx PCS does what you'd expect it to do.
> I don't know why you don't get a "SERDES auto-negotiation error" bit in
> the interrupt status register. Maybe you need to first enable it in the
> interrupt enable register?! Who knows. Not sure how far it's worth
> diving into this.
>
>> It's all 1s when using RGMII. These bits are reserved, so it's not that
>> interesting, but maybe these registers are not as useless as they seem.
>
> Yeah, with RGMII I don't know if the PHY responds to the SERDES registers
> over MDIO. All ones may mean the MDIO bus pull-ups.
>
>> > Of course, it doesn't say what happens when the bit for software-driven
>> > SGMII autoneg is *not* set, if the process can be at all bypassed.
>> > I suppose now that it can't, otherwise the ANLPAR register could also be
>> > writable over MDIO, they would have likely reused at least partly the
>> > same mechanisms.
>> >
>> >> > + ret = phy_read_paged(phydev, 0xd08, RTL8211FS_SGMII_ANARSEL);
>> >>
>> >> That said, you have to use the "Indirect access method" to access this
>> >> register (per section 8.5). This is something like
>> >>
>> >> #define RTL8211F_IAAR 0x1b
>> >> #define RTL8211F_IADR 0x1c
>> >>
>> >> #define RTL8211F_IAAR_PAGE GENMASK(15, 4)
>> >> #define RTL8211F_IAAR_REG GENMASK(3, 1)
>> >> #define INDIRECT_ADDRESS(page, reg) \
>> >> (FIELD_PREP(RTL8211F_IAAR_PAGE, page) | \
>> >> FIELD_PREP(RTL8211F_IAAR_REG, reg - 16))
>> >>
>> >> ret = phy_write_paged(phydev, 0xa43, RTL8211F_IAAR,
>> >> INDIRECT_ADDRESS(0xd08, RTL8211FS_SGMII_ANARSEL));
>> >> if (ret < 0)
>> >> return ret;
>> >>
>> >> ret = phy_read_paged(phydev, 0xa43, RTL8211F_IADR);
>> >> if (ret < 0)
>> >> return ret;
>> >>
>> >> I dumped the rest of the serdes registers using this method, but I
>> >> didn't see anything interesting (all defaults).
>> >
>> > I'm _really_ not sure where you got the "Indirect access method" via
>> > registers 0x1b/0x1c from.
>>
>> Huh. Looks like this is a second case of differing datasheets. Mine is
>> revision 1.8 dated 2021-04-21. The documentation for indirect access was
>> added in revision 1.7 dated 2020-07-08. Although it seems like the
>> SERDES registers were also added in this revision, so maybe you just
>> missed this section?
>
> I have Rev. 1.2. dated July 2014. Either that, or I'm holding the book
> upside down...
Looks like 1.7 just added SERDES ANSCR and SERDES SSR. So I suppose the
other registers were documented earlier, without the not about indirect
access.
>> > My datasheet for RTL8211FS doesn't show
>> > offsets 0x1b and 0x1c in page 0xa43.
>>
>> Neither does mine. These registers are only documented by reference from
>> section 8.5. They also aren't named, so the above defines are my own
>> coinage.
>>
>> > Additionally, I cross-checked with
>> > other registers that are accessed by the driver (like the Interrupt
>> > Enable Register), and the driver access procedure -
>> > phy_write_paged(phydev, 0xa42, RTL821x_INER, val) - seems to be pretty
>> > much in line with what my datasheet shows.
>>
>> | The SERDES related registers should be read and written through indirect
>> | access method. The registers include Page 0xdc0 to Page 0xdcf and Page
>> | 0xde0 to Page 0xdf0.
>>
>> Each register accessed this way also has
>>
>> | Note: This register requires indirect access.
>>
>> below the register table.
>
> Ok, possible. And none of the registers accessed by Linux using
> phy_read_paged() / phy_write_paged() have the "indirect access" note?
Correct.
> Maybe it was a documentation update as you say, which I don't have.
Looks like it. Probably to work around some bug.
>> >> I think it would be better to just return PHY_AN_INBAND_ON when using
>> >> SGMII.
>> >
>> > Well, of course hardcoding PHY_AN_INBAND_ON in the driver is on the
>> > table, if it isn't possible to alter this setting to the best of our
>> > knowledge (or if it's implausible that someone modified it). And this
>> > seems more and more like the case.
>>
>> I meant something like
>>
>> if (interface == PHY_INTERFACE_MODE_SGMII)
>> return PHY_AN_INBAND_ON;
>>
>> return PHY_AN_INBAND_UNKNOWN;
>
> Absolutely, I understood the first time. So you confirm that such a
> change makes your Lynx PCS promote to MLO_AN_INBAND, which makes the
> RTL8211FS work, right?
Correct.
>> Although for RGMII, in-band status is (per MIICR2):
>>
>> - Enabled by default
>> - Disablable
>> - Optional
>>
>> So maybe we should do (PHY_AN_INBAND_ON | PHY_AN_INBAND_OFF) in that
>> case. That said, RGMII in-band is not supported by phylink (yet).
>
> Well, it kinda is. I even said this in one of the commit messages
>
> | net: phy: at803x: validate in-band autoneg for AT8031/AT8033
> |
> | These PHYs also support RGMII, and for that mode, we report that in-band
> | AN is unknown, which means that phylink will not change the mode from
> | the device tree. Since commit d73ffc08824d ("net: phylink: allow
> | RGMII/RTBI in-band status"), RGMII in-band status is a thing, and I
> | don't want to meddle with that unless I have a reason for it.
>
> Although I'd be much more comfortable for now if we could concentrate on
> SERDES protocols. I'm not exactly sure what are the hardware state machines
> and responsible standards for RGMII in-band status, what will happen on
> settings mismatch (I know that NXP MACs fail to link up if we enable the
> feature but we attach a switch and not a PHY to RGMII - see c76a97218dcb
> ("net: enetc: force the RGMII speed and duplex instead of operating in
> inband mode") - but not much more). Essentially I don't know enough
> right now to even attempt to make any generalizations. Although I suppose
> a discussion could be started about it.
I think deferring this is the right thing to do. It's not clear to me if
in-band RGMII is even necessary (as all hardware I've seen includes MDIO).
--Sean
Powered by blists - more mailing lists