netdev - Re: [PATCH net v3] net: phy: intel-xway: enable integrated led functions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <Yga2YbglzJ6CvMFo@lunn.ch>
Date:   Fri, 11 Feb 2022 20:17:53 +0100
From:   Andrew Lunn <andrew@...n.ch>
To:     Tim Harvey <tharvey@...eworks.com>
Cc:     Martin Schiller <ms@....tdt.de>, Hauke Mehrtens <hauke@...ke-m.de>,
        martin.blumenstingl@...glemail.com,
        Florian Fainelli <f.fainelli@...il.com>, hkallweit1@...il.com,
        Russell King - ARM Linux <linux@...linux.org.uk>,
        David Miller <davem@...emloft.net>, kuba@...nel.org,
        netdev <netdev@...r.kernel.org>,
        open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net v3] net: phy: intel-xway: enable integrated led
 functions

On Thu, Feb 10, 2022 at 07:52:49AM -0800, Tim Harvey wrote:
> On Wed, Feb 9, 2022 at 4:04 PM Andrew Lunn <andrew@...n.ch> wrote:
> >
> > > The errata can be summarized as:
> > > - 1 out of 100 boots or cable plug events RGMII GbE link will end up
> > > going down and up 3 to 4 times then resort to a 100m link; workaround
> > > has been found to require a pin level reset
> >
> > So that sounds like it is downshifting because it thinks there is a
> > broken pair. Can you disable downshift? Problem is, that might just
> > result in link down.
> 
> Its a bad situation. The actual errata is that the device latches into
> a bad state where there is some noise on an ADC or something like that
> that cause a high packet error rate. The firmware baked into the PHY
> has a detection mechanism looking at these errors (SSD errors) and if
> there are enough of them it takes the link down and up again and if
> that doesn't resolve in 3 times it shifts down to 100mbs. They call
> this 'ADS' or 'auto-down-speed' and you can disable it but it would
> just result in leaving your bad gbe link up. It's unclear yet if it's
> better to just detect the ADS event and reset or to disable ADS and
> look for the SSD errors myself (which I can do).

I don't think it matters too much which way you detect there is a
problem. But ideally you need a recovery which does not need a
hardware reset. Than you don't need to worry about the other PHY
sharing the reset line. But you know that...

> I agree that I can't do anything in boot firmware. I was planning on
> having some static code that registered a PHY fixup to get a call when
> these PHYs were detected and I could then kick off a polling thread to
> watch for errors and trigger a reset. The reset could have knowledge
> of the PHY devices that called the fixup handler so that I can at
> least setup each PHY again.

That sounds like a reasonable architecture. Your thread would need to
do:

phy_stop()
phy_init_hw()
phy_start()

and phylib probably will do the reset.

Maybe you can put the problem detection code in the .read_status
callback, which sets am 'im_fubar' flag in the drivers private
structure. That gives some building blocks for other users of this PHY
who don't have a shared reset line, and can maybe solve the problem
within the driver.

       Andrew