lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aOdQrJMtafhOh3GQ@FUE-ALEWI-WINX>
Date: Thu, 9 Oct 2025 08:05:32 +0200
From: Alexander Wilhelm <alexander.wilhelm@...termo.com>
To: Vladimir Oltean <vladimir.oltean@....com>
Cc: "Russell King (Oracle)" <linux@...linux.org.uk>,
        Andrew Lunn <andrew@...n.ch>, Heiner Kallweit <hkallweit1@...il.com>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: Aquantia PHY in OCSGMII mode?

On Wed, Oct 08, 2025 at 05:55:49PM +0300, Vladimir Oltean wrote:
> On Wed, Oct 08, 2025 at 03:28:07PM +0200, Alexander Wilhelm wrote:
> > I have the broken 100M link state again (IF_MODE=3). Below are the debug
> > details I was able to observe:
> > 
> > * With 2.5G link:
> > 
> >     mdio_bus 0x0000000ffe4e5000:00: BMSR 0x2d, BMCR 0x1140, ADV 0x41a0, LPA 0xdc01, IF_MODE 0x3
> > 
> > * With 1G link:
> > 
> >     mdio_bus 0x0000000ffe4e5000:00: BMSR 0x2d, BMCR 0x1140, ADV 0x41a0, LPA 0xd801, IF_MODE 0x3
> > 
> > * With 100M link:
> > 
> >     mdio_bus 0x0000000ffe4e5000:00: BMSR 0x2d, BMCR 0x1140, ADV 0x41a0, LPA 0xd401, IF_MODE 0x3
> 
> Ok, this is why I didn't trust the print from lynx_pcs_config(). BMSR was 0x29
> in your previous log (no link) and is 0x2d now. Also, the LPA for 100M is
> different (I trust this one).
> 
> We have:
> 2.5G link: LPA_SGMII_SPD_MASK bits = 0b11 => undefined behaviour, reserved value
> 1G link: LPA_SGMII_SPD_MASK bits = 0b10 => 1G, the only proper value (by coincidence, of course)
> 100M link: LPA_SGMII_SPD_MASK bits = 0b01 => 100M, PHY practically requests 10x symbol replication, and the Lynx PCS obliges
> 
> So the AQR115 PHY uses the SGMII base page format, and with the IF_MODE=0 fix,
> the Lynx PCS uses the Clause 37 base page format.
> 
> We know that in-band autoneg is enabled in the AQR115 PHY and we don't
> know how to disable it, and we know that for traffic to pass, one of two
> things must happen:
> 
> 1. In-band autoneg must complete (as required by LINK_INBAND_ENABLE).
>    This happens when we have managed = "in-band-status" in the device tree.
>    - From the AQR115 perspective, SGMII AN completes if it receives a base page
>      with the ACK bit set. Since SGMII and Clause 37 are compatible in this
>      regard (the ACK bit is in the same position, bit 14), the Lynx PCS
>      fulfills what the AQR115 expects.
>    - From the Lynx PCS perspective, clause 37 AN also completes if it
>      receives a base page with the ACK bit set. Which again it does, but
>      the SGMII code word overlaps in strange ways (Next Page and Remote
>      Fault 1 end up being set, neither Half Duplex nor Full Duplex bits
>      are set), so the Lynx PCS may behave in unpredictable ways.
> 2. In-band autoneg fails, but the AQR115 PHY falls back to full data
>    rate anyway (as permitted by LINK_INBAND_BYPASS). This happens when
>    we do _not_ have managed = "in-band-status" in the device tree.
>    The Lynx PCS does not respond with code words having the ACK bit set,
>    and does not generate clause 37 code words of its own, instead goes
>    to data mode directly. AQR115 eventually goes to data mode too.
> 
> I expect that your setup works through #2 right now.
> 
> The symbol replication aspect is now clarified, there is a new question mark caused by
> the 0b11 speed bits also empirically passing traffic despite being a reserved value,
> and in order to gain a bit more control over things and make them more robust, we need
> to see how the PHY driver can implement aqr_gen2_inband_caps() and aqr_gen2_config_inband()
> for PHY_INTERFACE_MODE_2500BASEX, and fix up the base pages the PHY is sending
> (the current format is broken per all known standards).
> 
> Thanks a lot for testing.

It was my pleasure to help. Thank you for your patch suggestions and especially
for the detailed explanations. I now have a much better understanding of how the
PHY and MAC interact.


Best regards
Alexander Wilhelm

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ