lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 10 Feb 2022 07:52:49 -0800 From: Tim Harvey <tharvey@...eworks.com> To: Andrew Lunn <andrew@...n.ch> Cc: Martin Schiller <ms@....tdt.de>, Hauke Mehrtens <hauke@...ke-m.de>, martin.blumenstingl@...glemail.com, Florian Fainelli <f.fainelli@...il.com>, hkallweit1@...il.com, Russell King - ARM Linux <linux@...linux.org.uk>, David Miller <davem@...emloft.net>, kuba@...nel.org, netdev <netdev@...r.kernel.org>, open list <linux-kernel@...r.kernel.org> Subject: Re: [PATCH net v3] net: phy: intel-xway: enable integrated led functions On Wed, Feb 9, 2022 at 4:04 PM Andrew Lunn <andrew@...n.ch> wrote: > > > The errata can be summarized as: > > - 1 out of 100 boots or cable plug events RGMII GbE link will end up > > going down and up 3 to 4 times then resort to a 100m link; workaround > > has been found to require a pin level reset > > So that sounds like it is downshifting because it thinks there is a > broken pair. Can you disable downshift? Problem is, that might just > result in link down. Its a bad situation. The actual errata is that the device latches into a bad state where there is some noise on an ADC or something like that that cause a high packet error rate. The firmware baked into the PHY has a detection mechanism looking at these errors (SSD errors) and if there are enough of them it takes the link down and up again and if that doesn't resolve in 3 times it shifts down to 100mbs. They call this 'ADS' or 'auto-down-speed' and you can disable it but it would just result in leaving your bad gbe link up. It's unclear yet if it's better to just detect the ADS event and reset or to disable ADS and look for the SSD errors myself (which I can do). > > > - 1 out of 100 boots or cable plug events (varies per board) SGMII > > will fail link between the MAC and PHY; workaround has been found to > > require a pin level reset > > I don't suppose there is a register to restart SGMII sync? Sometimes > there is. Not that I see but I haven't really investigated too much into mitigating that issue yet. The errata for that issue says you need to assert reset but then it also says it can occur on a cable plug event which makes me think an MDI ANEG restart may be sufficient. > > Anyway, shared reset makes this messy, as you said. Unfortunate > design. But i don't see how you can work around this in the > bootloader, especially the cable plug events. > Ya, in hindsight the shared reset was a really bad idea, of course the last PHY we used on this particular board for years before the supply chain crashed didn't have any issues like this. I agree that I can't do anything in boot firmware. I was planning on having some static code that registered a PHY fixup to get a call when these PHYs were detected and I could then kick off a polling thread to watch for errors and trigger a reset. The reset could have knowledge of the PHY devices that called the fixup handler so that I can at least setup each PHY again. Regardless of how I go about this the end result may be unreliable networking for up to a couple of minutes after board power-up or cable plug event. Best Regards, Tim
Powered by blists - more mailing lists