netdev - Re: Ethtool: advance phy debug support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0UcVs9SwiGjmXQcLVX7pSRwoQ2ZaWorXOc7Tm_FFi80pJA@mail.gmail.com>
Date: Tue, 18 Nov 2025 08:58:47 -0800
From: Alexander Duyck <alexander.duyck@...il.com>
To: Andrew Lunn <andrew@...n.ch>
Cc: Lee Trager <lee@...ger.us>, Maxime Chevallier <maxime.chevallier@...tlin.com>, 
	Susheela Doddagoudar <susheelavin@...il.com>, netdev@...r.kernel.org, mkubecek@...e.cz, 
	Hariprasad Kelam <hkelam@...vell.com>, Alexander Duyck <alexanderduyck@...com>
Subject: Re: Ethtool: advance phy debug support

On Tue, Nov 18, 2025 at 5:50 AM Andrew Lunn <andrew@...n.ch> wrote:
>
> > > As i said before, what is important is we have an architecture that
> > > allows for PRBS in different locations. You don't need to implement
> > > all those locations, just the plumbing you need for your use case. So
> > > MAC calling phylink, calling into the PCS driver. We might also need
> > > some enumeration of where the PRBSes are, and being able to select
> > > which one you want to use, e.g. you could have a PCS with PRBS, doing
> > > SGMII connecting to a Marvell PHY which also has PRBS.
> >
> > It seems to me like we would likely end up with two different setups.
> > For the SerDes PHYs they would likely end up with support for many more
> > test patterns than a standard Ethernet PHY would.
> >
> > I know I had been looking at section 45.2.1.168 - 45.2.1.174 of the
> > IEEE 802.3 spec as that would be the standard for a PMA/PMD interface,
> > or section 45.2.3.17 - 45.2.3.20 for the PCS interface, as to how to do
> > this sort of testing on Ethernet using a c45 PHY. I wonder if we
> > couldn't use those registers as a general guide for putting together
> > the interface to enable the PHY testing with the general idea being
> > that the APIs should translate to similar functionality as what is
> > exposed in the IEEE spec.
>
> It probably needs somebody to look at the different PRBS and see what
> is common and what is different. 802.3 is a good starting point.
> If you look around you can find some Marvell documents:
>
> https://www.mouser.com/pdfDocs/marvell-phys-transceivers-alaska-c-88x5113-datasheet-2018-07.pdf
> https://www.marvell.com/content/dam/marvell/en/public-collateral/phys-transceivers/marvell-phys-transceivers-alaska-m-88e21x0-datasheet.pdf
> https://www.marvell.com/content/dam/marvell/en/public-collateral/phys-transceivers/marvell-phys-transceivers-alaska-x-88x2222-datasheet.pdf
>
> And there are other vendors:
>
> https://www.ti.com/lit/ds/symlink/dp83tc811r-q1.pdf
>
> But we should also make use of the flexibility of netlink. We can
> probably get a core set of attributes, but maybe also allow each PRBS
> to make use of additional attributes?

The point I was trying to make is that Ethernet only uses a subset of
the available tests that most of these devices provide. There are only
a few tests recommended for the various media types. Most of the test
types are called out in the various sections and 45.2.1.170 lists many
of them off.

> > It isn't so much hard wired as limited based on the cable connected to
> > it. In addition the other end doesn't do any sort of autoneg.
>
> For PRBS, i doubt you want negotiation. Do you actually have a link
> partner? Or it is some test equipment? If you are tuning SERDES
> windows/eyes, you have to assume you are going to make the link worse,
> before it gets better, and so autoneg will fail. So i expect the
> general case of anybody using PRBS is going to want to use 'ethtool -s
> autoneg off' to force the system into a specific mode.

Right. With PRBS you already should know what the link partner is
configured for. It is usually a manual process on both sides. That is
why I mentioned the cable/module EEPROM. The cable will indicate the
number of lanes present and the recommended/maximum frequency and
modulation it is supposed to be used at. With that you can essentially
determine what the correct setup would be to test it as this is mostly
just a long duration cable test. The only imitation is if there is
something in between such as a PMA/PMD that is configured for 2 lanes
instead of 4.

> > As far as the testing itself, we aren't going to be linking anyway. So
> > the configured speed/duplex won't matter.
>
> I'm surprised about that. Again, general case, would a 1G 1000baseX
> allow/require different tuning to a 2500BaseX link? It is clocked at a
> different frequency? Duplex however is probably not an issue, does an
> SGMII SERDES running at 100Half even look different to a 100Full? The
> SERDES always runs at the same speed, 10/100 just require symbol
> duplication to full the stream up to 1G. And link modes > 1G don't
> have a duplex setting.

I think we have a different understanding of what "link" means. For
most modern setups we aren't sending the data over an individual lane.
In many cases the MAC has several virtual lanes it is using. It is
being multiplexed via MLD or RS-FEC encoding and passed that way. So
you cannot get "link" without passing the signal through those.

When we are doing a PRBS test we aren't using those and are just
sending the raw pattern across trying to check for noise on the wire.
The Tx side is essentially shouting into the abyss and hopefully the
other end is configured to verify the signal. Likewise the Rx side
gets configured, but it doesn't guarantee that there is a transmitter
on the other end to send to it. It is very much a manual setup
process. The PRBS testing is per individual lane and is in no-way
aggregating them together or splitting them up into virtual lanes. As
such in my point of view it isn't providing/getting a "link". To
properly set it all up you have to essentially kick off the Tx on both
sides, and then you can start collecting samples by enabling the Rx
which then has to get a signal lock on that lane, and then collect the
data. So the process, while similar to getting a "link", is very much
different from getting the link at the various BASE-R speed
capabilities that are verified by the test.

> > When we are testing either
> > the PCS or the PMA/PMD is essentially running the show and everything
> > above it is pretty much cut off. So the MAC isn't going to see a link
> > anyway. In the grand scheme of things it is basically just a matter of
> > setting up the lanes and frequency/modulation for those lanes.
>
> And the kernel API for that, at the top level is ksettings_set(). I
> agree the MAC is not sending packets etc, but it is the one
> configuring everything below it, via phylink/phylib or firmware. Is
> there really any difference between a real configuration and a PRBS
> configuration for testing a link mode?

The problem is the test config may not make sense to the MAC. As I
mentioned, PRBS testing is done per lane, not for the entire link. As
such we would be asking an individual lane or set of lanes to get
configured for a specific frequency/modulation and then to run a
specific pattern. Arguably we would probably want to keep the MAC out
of the PRBS testing setup for that reason.

> And then we need a second API to access whatever you want to tune,
> which i guess is vendor specific. As far as i remember, Lee's basic
> design did separate this into a different API after looking around at
> what different vendors provided.

Yes. Again most of these settings appear to be per-lane in both the IP
we have and the IEEE specification. For example it occurs to me that a
device could be running a 25G or 50G link over a single QSFP cable,
and still have testing enabled for the unused 2 or 3 lanes on the
cable potentially assuming the PMA/PMD is a 4 lane link and is only
using one or two lanes for the link.

> > This is one of the reasons why I was thinking of something like a
> > phydev being provided by the driver. Specifically it provides an
> > interface that can be inspected by a netdev via standard calls to
> > determine things like if the link is allowed to come up. In the case of
> > the phydev code it already had all the bits in place for PHY_CABLETEST
> > as a state.
>
> And this is why i talked about infrastructure, or core for PRBS,
> something which can deal with a netdev state transitions. A don't see
> a phydev as a good representation of a PRBS. We probably want a PRBS
> 'device' which can be embedded in a phydev, or a PCS, or a generic
> PHY, which registers itself to the PRBS core, and it is associated to
> a netdev.

One thing we may want to consider instead of having a PRBS device
might be to look at having something like a "lane" device. As it
currently stands, I think we are going to start running into issues as
we start fanning out the setups and adding more lanes or running
devices in parallel. The 50G-R2 setup was already a challenge and odds
are 800G(https://ethernettechnologyconsortium.org/wp-content/uploads/2021/10/Ethernet-Technology-Consortium_800G-Specification_r1.1.pdf)
is going to be a similar mess if not worse as it starts making much
more use of the lane muxing and multiple PCS devices.