[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251008145549.6zhlsvphgx62zwgp@skbuf>
Date: Wed, 8 Oct 2025 17:55:49 +0300
From: Vladimir Oltean <vladimir.oltean@....com>
To: Alexander Wilhelm <alexander.wilhelm@...termo.com>
Cc: "Russell King (Oracle)" <linux@...linux.org.uk>,
Andrew Lunn <andrew@...n.ch>,
Heiner Kallweit <hkallweit1@...il.com>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: Aquantia PHY in OCSGMII mode?
On Wed, Oct 08, 2025 at 03:28:07PM +0200, Alexander Wilhelm wrote:
> I have the broken 100M link state again (IF_MODE=3). Below are the debug
> details I was able to observe:
>
> * With 2.5G link:
>
> mdio_bus 0x0000000ffe4e5000:00: BMSR 0x2d, BMCR 0x1140, ADV 0x41a0, LPA 0xdc01, IF_MODE 0x3
>
> * With 1G link:
>
> mdio_bus 0x0000000ffe4e5000:00: BMSR 0x2d, BMCR 0x1140, ADV 0x41a0, LPA 0xd801, IF_MODE 0x3
>
> * With 100M link:
>
> mdio_bus 0x0000000ffe4e5000:00: BMSR 0x2d, BMCR 0x1140, ADV 0x41a0, LPA 0xd401, IF_MODE 0x3
Ok, this is why I didn't trust the print from lynx_pcs_config(). BMSR was 0x29
in your previous log (no link) and is 0x2d now. Also, the LPA for 100M is
different (I trust this one).
We have:
2.5G link: LPA_SGMII_SPD_MASK bits = 0b11 => undefined behaviour, reserved value
1G link: LPA_SGMII_SPD_MASK bits = 0b10 => 1G, the only proper value (by coincidence, of course)
100M link: LPA_SGMII_SPD_MASK bits = 0b01 => 100M, PHY practically requests 10x symbol replication, and the Lynx PCS obliges
So the AQR115 PHY uses the SGMII base page format, and with the IF_MODE=0 fix,
the Lynx PCS uses the Clause 37 base page format.
We know that in-band autoneg is enabled in the AQR115 PHY and we don't
know how to disable it, and we know that for traffic to pass, one of two
things must happen:
1. In-band autoneg must complete (as required by LINK_INBAND_ENABLE).
This happens when we have managed = "in-band-status" in the device tree.
- From the AQR115 perspective, SGMII AN completes if it receives a base page
with the ACK bit set. Since SGMII and Clause 37 are compatible in this
regard (the ACK bit is in the same position, bit 14), the Lynx PCS
fulfills what the AQR115 expects.
- From the Lynx PCS perspective, clause 37 AN also completes if it
receives a base page with the ACK bit set. Which again it does, but
the SGMII code word overlaps in strange ways (Next Page and Remote
Fault 1 end up being set, neither Half Duplex nor Full Duplex bits
are set), so the Lynx PCS may behave in unpredictable ways.
2. In-band autoneg fails, but the AQR115 PHY falls back to full data
rate anyway (as permitted by LINK_INBAND_BYPASS). This happens when
we do _not_ have managed = "in-band-status" in the device tree.
The Lynx PCS does not respond with code words having the ACK bit set,
and does not generate clause 37 code words of its own, instead goes
to data mode directly. AQR115 eventually goes to data mode too.
I expect that your setup works through #2 right now.
The symbol replication aspect is now clarified, there is a new question mark caused by
the 0b11 speed bits also empirically passing traffic despite being a reserved value,
and in order to gain a bit more control over things and make them more robust, we need
to see how the PHY driver can implement aqr_gen2_inband_caps() and aqr_gen2_config_inband()
for PHY_INTERFACE_MODE_2500BASEX, and fix up the base pages the PHY is sending
(the current format is broken per all known standards).
Thanks a lot for testing.
Powered by blists - more mailing lists