lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 24 Nov 2021 14:04:41 +0200
From:   Vladimir Oltean <olteanv@...il.com>
To:     "Russell King (Oracle)" <linux@...linux.org.uk>
Cc:     Marek BehĂșn <kabel@...nel.org>,
        netdev@...r.kernel.org, Andrew Lunn <andrew@...n.ch>,
        Rob Herring <robh+dt@...nel.org>, devicetree@...r.kernel.org,
        Jakub Kicinski <kuba@...nel.org>,
        Sean Anderson <sean.anderson@...o.com>, davem@...emloft.net
Subject: Re: [PATCH net-next v2 4/8] net: phylink: update
 supported_interfaces with modes from fwnode

On Wed, Nov 24, 2021 at 11:04:37AM +0000, Russell King (Oracle) wrote:
> On Wed, Nov 24, 2021 at 12:54:18AM +0200, Vladimir Oltean wrote:
> > This implies that when you bring up a board and write the device tree
> > for it, you know that PHY mode X works without ever testing it. What if
> > it doesn't work when you finally add support for it? Now you already
> > have one DT blob in circulation. That's why I'm saying that maybe it
> > could be better if we could think in terms that are a bit more physical
> > and easy to characterize.
> 
> However, it doesn't solve the problem. Let's take an example.
> 
> The 3310 supports a mode where it runs in XAUI/5GBASE-R/2500BASE-X/SGMII
> depending on the negotiated media parameters.
> 
> XAUI is four lanes of 3.125Gbaud.
> 5GBASE-R is one lane of 5.15625Gbaud.
> 
> Let's say you're using this, and test the 10G speed using XAUI,
> intending the other speeds to work. So you put in DT that you support
> four lanes and up to 5.15625Gbaud.

Yes, see, the blame's on you if you do that. You effectively declared
that the lane is able of sustaining a data rate higher than you've
actually had proof it does (5.156 vs 3.125).

The reason why I'm making this suggestion is because I think it lends
itself better to the way in which hardware manufacturers work.
A hobbyist like me has no choice than to test the highest data rate when
determining what frequency to declare in the DT (it's the same thing for
spi-max-frequency and other limilar DT properties, really), but hardware
people have simulations based on IBIS-AMI models, they can do SERDES
self-tests using PRBS patterns, lots of stuff to characterize what
frequency a lane is rated for, without actually speaking any Ethernet
protocol on it. In fact there are lots of people who can do this stuff
(which I know mostly nothing about) with precision without even knowing
how to even type a simple "ls" inside a Linux shell.

> Later, you discover that 5GBASE-R doesn't work because there's an
> electrical issue with the board. You now have DT in circulation
> which doesn't match the capabilities of the hardware.
> 
> How is this any different from the situation you describe above?
> To me, it seems to be exactly the same problem.

To err is human, of course. But one thing I think we learned from the
old implementation of phylink_validate is that it gets very tiring to
keep adding PHY modes, and we always seem to miss some. When that array
will be described in DT, it could be just a tad more painful to maintain.

> So, I don't think one can get away from these kinds of issues - where
> you create a board, do insufficient testing, publish a DT description,
> and then through further testing discover you have to restrict the
> hardware capabilities in DT. In fact, this is sadly an entirely normal
> process - problems always get found after boards have been sent out
> and initial DT has been created.
> 
> A good example is the 6th switch port on the original Clearfog boards.
> This was connected to an external PHY at address 0 on the MDIO bus
> behind the switch. However, the 88E6176 switch already has an internal
> PHY at address 0, so the external PHY can't be accessed. Consequently,
> port 6 is unreliable. That only came to light some time later, and
> resulted in the DT needing to be modified.

Sure, but this is a bit unrelated.

> There are always problems that need DT to be fixed - I don't think it's
> possible to get away from that by changing what or how something is
> described. In fact, I think trying to make that argument actually shows
> a misunderstanding of the process of hardware bringup.

Oh, how so?

> Just like software, hardware is buggy and takes time to be debugged,
> and that process continues after the first version of DT for a board
> has been produced, and there will always be changes required to DT.
> 
> I'm not saying that describing the maximum frequency and lanes doesn't
> have merit, I'm merely taking issue with the basis of your argument.
> 
> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ