lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 10 Feb 2022 07:52:49 -0800
From:   Tim Harvey <tharvey@...eworks.com>
To:     Andrew Lunn <andrew@...n.ch>
Cc:     Martin Schiller <ms@....tdt.de>, Hauke Mehrtens <hauke@...ke-m.de>,
        martin.blumenstingl@...glemail.com,
        Florian Fainelli <f.fainelli@...il.com>, hkallweit1@...il.com,
        Russell King - ARM Linux <linux@...linux.org.uk>,
        David Miller <davem@...emloft.net>, kuba@...nel.org,
        netdev <netdev@...r.kernel.org>,
        open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net v3] net: phy: intel-xway: enable integrated led functions

On Wed, Feb 9, 2022 at 4:04 PM Andrew Lunn <andrew@...n.ch> wrote:
>
> > The errata can be summarized as:
> > - 1 out of 100 boots or cable plug events RGMII GbE link will end up
> > going down and up 3 to 4 times then resort to a 100m link; workaround
> > has been found to require a pin level reset
>
> So that sounds like it is downshifting because it thinks there is a
> broken pair. Can you disable downshift? Problem is, that might just
> result in link down.

Its a bad situation. The actual errata is that the device latches into
a bad state where there is some noise on an ADC or something like that
that cause a high packet error rate. The firmware baked into the PHY
has a detection mechanism looking at these errors (SSD errors) and if
there are enough of them it takes the link down and up again and if
that doesn't resolve in 3 times it shifts down to 100mbs. They call
this 'ADS' or 'auto-down-speed' and you can disable it but it would
just result in leaving your bad gbe link up. It's unclear yet if it's
better to just detect the ADS event and reset or to disable ADS and
look for the SSD errors myself (which I can do).

>
> > - 1 out of 100 boots or cable plug events (varies per board) SGMII
> > will fail link between the MAC and PHY; workaround has been found to
> > require a pin level reset
>
> I don't suppose there is a register to restart SGMII sync?  Sometimes
> there is.

Not that I see but I haven't really investigated too much into
mitigating that issue yet. The errata for that issue says you need to
assert reset but then it also says it can occur on a cable plug event
which makes me think an MDI ANEG restart may be sufficient.

>
> Anyway, shared reset makes this messy, as you said. Unfortunate
> design. But i don't see how you can work around this in the
> bootloader, especially the cable plug events.
>

Ya, in hindsight the shared reset was a really bad idea, of course the
last PHY we used on this particular board for years before the supply
chain crashed didn't have any issues like this.

I agree that I can't do anything in boot firmware. I was planning on
having some static code that registered a PHY fixup to get a call when
these PHYs were detected and I could then kick off a polling thread to
watch for errors and trigger a reset. The reset could have knowledge
of the PHY devices that called the fixup handler so that I can at
least setup each PHY again.

Regardless of how I go about this the end result may be unreliable
networking for up to a couple of minutes after board power-up or cable
plug event.

Best Regards,

Tim

Powered by blists - more mailing lists