netdev - Re: Ethtool: advance phy debug support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4b7f0ce90f17bd0168cbf3192a18b48cdabfa14b.camel@gmail.com>
Date: Mon, 17 Nov 2025 16:52:35 -0800
From: Alexander H Duyck <alexander.duyck@...il.com>
To: Andrew Lunn <andrew@...n.ch>, Lee Trager <lee@...ger.us>
Cc: Maxime Chevallier <maxime.chevallier@...tlin.com>, Susheela Doddagoudar	
 <susheelavin@...il.com>, netdev@...r.kernel.org, mkubecek@...e.cz,
 Hariprasad Kelam <hkelam@...vell.com>, Alexander Duyck
 <alexanderduyck@...com>
Subject: Re: Ethtool: advance phy debug support

On Sun, 2025-11-16 at 00:27 +0100, Andrew Lunn wrote:
> > PRBS testing can be used as a signal integrity test between any two end
> > points, not just networking. For example we have CSRs to allow PRBS testing
> > on PCIE with fbnic. My thought was always to limit the scope to network use
> > case. The feedback I received at Netdev was we need to handle this
> > generically for any phy, thus the suggestion to do this on phy. That adds a
> > ton of complexity so I'd be supportive to narrow this down to just
> > networking and leverage ethtool.
> 
> We need to be careful with terms here. We have PHYs driven by phylib,
> bitstreams to signals on twisted pairs, drivers/net/phy
> 
> And we have generic PHYs, which might contain a SERDES, for PCIE,
> SATA, USB, /drivers/phy.
> 
> Maxime reference to comphy for Marvell is a generic PHY, and they do
> implement SATA, USB and networking.
> 
> Having said that, i don't see why you should not narrow it down to
> networking, and ethtool. It might well be Marvell MAC drivers could
> call into the generic PHY, and the API needed for that should be
> reusable for anybody wanting to do testing of PCIE via a PRBS within a
> generic PHY.
> 
> As i said before, what is important is we have an architecture that
> allows for PRBS in different locations. You don't need to implement
> all those locations, just the plumbing you need for your use case. So
> MAC calling phylink, calling into the PCS driver. We might also need
> some enumeration of where the PRBSes are, and being able to select
> which one you want to use, e.g. you could have a PCS with PRBS, doing
> SGMII connecting to a Marvell PHY which also has PRBS.

It seems to me like we would likely end up with two different setups.
For the SerDes PHYs they would likely end up with support for many more
test patterns than a standard Ethernet PHY would.

I know I had been looking at section 45.2.1.168 - 45.2.1.174 of the
IEEE 802.3 spec as that would be the standard for a PMA/PMD interface,
or section 45.2.3.17 - 45.2.3.20 for the PCS interface, as to how to do
this sort of testing on Ethernet using a c45 PHY. I wonder if we
couldn't use those registers as a general guide for putting together
the interface to enable the PHY testing with the general idea being
that the APIs should translate to similar functionality as what is
exposed in the IEEE spec.

> > > That actually seems odd to me. I assume you need to set the link mode
> > > you want. Having it default to 10/Half is probably not what you
> > > want. You want to use ethtool_ksettings_set to force the MAC and PCS
> > > into a specific link mode. Most MAC drivers don't do anything if that
> > > call is made when the interface is admin down. And if you look at how
> > > most MAC drivers are structured, they don't bind to phylink/phylib
> > > until open() is called. So when admin down, you don't even have a
> > > PCS/PHY. And some designs have multiple PCSes, and you select the one
> > > you need based on the link mode, set by ethtool_ksettings_set or
> > > autoneg. And if admin down, the phylink will turn the SFP laser off.
> > 
> > fbnic does not currently support autoneg
> 
> autoneg does not really come into this. Yes, ksettings_set can be used
> to configure what autoneg offers to the link partner. But if you call
> ksettings_set with the autoneg parameter set to off, it is used to
> directly set the link mode. So this is going to be the generic way you
> set the link to the correct mode before starting the test.
> 
> fbnic is actually very odd in that the link mode is hard wired at
> production time. I don't know of any other device that does
> that. Because fbnic is odd, while designing this, you probably want to
> ignore it, consider 'normal' devices making use of the normal
> APIs. Maybe go buy a board using stmmac and the XPCS_PCS driver, so
> you have a normal system to work on? And then make sure the oddball
> fbnic can be somehow coerced to do the right thing like normal
> devices.

It isn't so much hard wired as limited based on the cable connected to
it. In addition the other end doesn't do any sort of autoneg. In theory
we should be able to resolve much of that once we get the SFP framework
updated so that it can actually handle QSFP and reading the CMIS and
SFF-8636 EEPROMs. Doing that we can at least determine what the media
supports and just run with that. Arguably much of the weirdness is due
to the 50R2 implementation as that seems to be where everything
diverges from the norm and becomes a parallel 25G setup without much
support for autoneg.

As far as the testing itself, we aren't going to be linking anyway. So
the configured speed/duplex won't matter. When we are testing either
the PCS or the PMA/PMD is essentially running the show and everything
above it is pretty much cut off. So the MAC isn't going to see a link
anyway. In the grand scheme of things it is basically just a matter of
setting up the lanes and frequency/modulation for those lanes.

The way I see it we need to be able to determine the number of lanes,
frequency, and then the test pattern we need to operate at. One of the
things is that the PMA/PMD interface provides tuning variables for
equalization based on if we are running with NRZ/PAM2 (4.2.1.112) or
PAM4 (4.2.1.135).

> > > > When I spoke with test engineers internally in Meta I could not come up with
> > > > a time period and over night testing came up as a requirement. I decided to
> > > > just let the user start and stop testing with no time requirement. If
> > > > firmware loses the host heartbeat it automatically disables PRBS testing.
> > > O.K. So i would probably go for a blocking netlink call, and when ^C
> > > is used, to exits PRBS and allows normal traffic. You then need to
> > > think about RTNL, which you cannot hold for hours.
> > RTNL() is only held when starting testing, its released once testing has
> > begun. We could set a flag on the netdev to say PRBS testing is running,
> > don't do anything else with this device until the flag is reset.
> 
> Its the ^C bit which makes it interesting. The idea is used other
> places in the stack. mrouted(1) and the kernel side for multicast
> routing does something similar. So long at the user space daemon holds
> the socket open, the kernel maintains the multicast routing
> cache. Once the socket is closed, because the daemon as died/exited,
> the kernel flushes the cache. But this is an old BSD sockets
> behaviour, not netlink sockets. I've no idea if you can do the same
> with netlink, get a notification when a process closes such a socket.
> 
> 	Andrew

This is one of the reasons why I was thinking of something like a
phydev being provided by the driver. Specifically it provides an
interface that can be inspected by a netdev via standard calls to
determine things like if the link is allowed to come up. In the case of
the phydev code it already had all the bits in place for PHY_CABLETEST
as a state.

Assuming we have something like that buried somewhere in the PHY,
either as a part of the PMAPMD or the PCS drivers we could then look at
having a state there that essentially communicates that "this device is
testing so brining the link up is blocked". Then it would just be a
matter of making sure you can pop the device into and out of that state
while holding the RTNL lock without having to hold it for the full
duration.