netdev - Re: [net-next PATCH v2 09/11] fbnic: Add SW shim for MDIO interface to PMA/PMD and PCS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAKgT0UcxyBCZw3TV97EAXi1KQLCX=O+=4ATzR3xyLYFySQG1Sw@mail.gmail.com>
Date: Mon, 3 Nov 2025 16:26:50 -0800
From: Alexander Duyck <alexander.duyck@...il.com>
To: Andrew Lunn <andrew@...n.ch>
Cc: netdev@...r.kernel.org, kuba@...nel.org, kernel-team@...a.com, 
	andrew+netdev@...n.ch, hkallweit1@...il.com, linux@...linux.org.uk, 
	pabeni@...hat.com, davem@...emloft.net
Subject: Re: [net-next PATCH v2 09/11] fbnic: Add SW shim for MDIO interface
 to PMA/PMD and PCS

On Mon, Nov 3, 2025 at 1:49 PM Andrew Lunn <andrew@...n.ch> wrote:
>
> On Mon, Nov 03, 2025 at 12:18:38PM -0800, Alexander Duyck wrote:
> > On Mon, Nov 3, 2025 at 10:59 AM Andrew Lunn <andrew@...n.ch> wrote:
> > >
> > > > The interface will consist of 2 PHYs each consisting of a PMA/PMD and a PCS
> > > > located at addresses 0 and 1.
> > >
> > > I'm missing a bit of architecture here.
> > >
> > > At least for speeds up to 10G, we have the MAC enumerate what it can
> > > do, the PCS enumerates its capabilities, and we read the EERPOM of the
> > > SFP to find out what it supports. From that, we can figure out the
> > > subset of link modes which are supported, and configure the MAC and
> > > PCS as required.
> >
> > The hardware we have is divisible with multiple entities running it
> > parallel. It can be used as a single instance, or multiple. With our
> > hardware we have 2 MACs that are sharing a single QSFP connection, but
> > the hardware can in theory have 4 MACs sharing a QSFP-DD connection.
> > The basic limitation is that underneath each MAC we can support at
> > most 2 lanes of traffic, so just the Base-R/R2 modes. Effectively what
> > we would end up with is the SFP PHY having to be chained behind the
> > internal PHY if there is one. In the case of the CR/KR setups though
> > we are usually just running straight from point-to-point with a few
> > meter direct attach cable or internal backplane connection.
>
> We need Russell to confirm, but i would expect the SFP driver will
> enumerate the capabilities of the SFP and include all the -1, -2 and
> -4 link modes. phylink will then call the pcs_validate, passing this
> list of link modes. The PCS knows it only supports 1 or 2 lanes, so it
> will remove all the -4 modes from the list. phylink will also pass the
> list to the MAC driver, and it can remove any it does not support.

In the drivers the limiting would be done based on the interface at
the PCS level.  The way I added the 25G, 50G, and 100G features was
based on the interface type, so that interface type would be what puts
a limit on the number of lanes supported.

In the actual hardware though the limiting factor is that the PMA/PMD
is only 2 lanes. The PCS actually supports 4 lanes and is just being
subdivided to only use 2 of them per MAC.

> It also sounds like you need to ask the firmware about
> provisioning. Does this instance have access to 1 or 2 lanes? That
> could be done in either the PCS or the MAC? The .validate can then
> remove even more link modes.

The way the hardware is setup we always have 2 physical lanes. Where
we need to decide on using one or two is dependent on which mode it is
we are wanting to use in software and what our link partner supports.
For example, on one of our test setups we just have a QSFP-DD loopback
plug installed and we can configure for whatever we want and link up
and talk to ourselves.

That actually presents a number of challenges for us as the SFP driver
currently doesn't understand CMIS, and again we are stuck having to
emulate the I2C via the driver as it is hidden behind the FW
interface. This is one of the reasons why we end up currently with the
FW telling us what the expected link mode/AUI currently is.

> > To
> > support that we will need to have access to 2 PCS instances as the IP
> > is divisible to support either 1 or 2 lanes through a single instance.
>
> Another architecture question.... Should phylink know there are two
> PCS instances? Or should it see just one? 802.3 defines registers for
> lanes 0-3, sometimes 0-7, sometimes 0-9, and even 0-19. So a single
> PCS should be enough for 2 lanes, or 4 lanes.

I'm thinking the driver needs to know about one, but it needs access
to the registers for both in order to be able to configure the
multi-lane setup. The issue is that the IP was made so that the vendor
registers for both lanes need to be configured identical for the 2
lane modes in order to make the device work. That is why I thought I
would go ahead and enable both lanes for now, while only connecting
one of them to the driver.

If we had multiple MACs both of the PCS lanes could have been used in
parallel for the 1 lane setups, however since we only have one MAC it
ends up running both lanes. Since that was the case I thought I would
stick to what would have likely been the layout if we had multiple
MACs which was to expose both lanes as separate PHYs, but only map the
device on the first one. That said, if need be I could look at just
remapping the PCS for the second lane as a MDIO_MMD_VEND1/2. I would
just have to relocate the RSFEC registers for the second lane and the
PCS vendor registers to that device.

> > Then underneath that is an internal PCS PMA which I plan to merge in
> > with the PMA/PMD I am representing here as the RSFEC registers are
> > supposed to be a part of the PMA. Again with 2 lanes supported I need
> > to access two instances of it for the R2 modes. Then underneath that
> > we have the PMD which is configurable on a per-lane basis.
>
> There is already some support for pma configuration in pcs-xpcs. See
> pcs-xpcs-nxp.c.

I was dealing with different IP from different vendors so I didn't
want to throw the PMD code in with the PCS code. I suppose I could do
so though if that is what you are suggesting, I would essentially just
be up-leveling it to the PMA interface.

As it stood I was considering adding an MMD 8 interface to represent
the PMA on the PCS since that would probably be closer to what we
actually have going on where the PCS/FEC/PMA block on top is then
talking to the PMA/PMD which is then doing the equalization and
training before we send the data over the Direct Attach Copper cable.

I had chosen the phydev route as it already had a good way of handling
this. I was thinking the phydev could essentially become a fractional
QSFP bus as it will likely be some time before we could even get
support for a standard QSFP bus and CMIS upstream.

> > The issue is that the firmware is managing the PMD underneath us. As a
> > result we don't have full control of the link. One issue we are
> > running into is that the FW will start training when it first gets a
> > signal and it doesn't block the signal from getting to the PCS. The
> > PCS will see the signal and immediately report the link as "up" if the
> > quality is good enough. This results in us suddenly seeing the link
> > flapping for about 2-3 seconds while the training is happening. So to
> > prevent that from happening we are adding the phydev representing the
> > PMD to delay the link up by the needed 4 seconds to prevent the link
> > flap noise.
>
> So it seems like you need to extend dw_xpcs_compat with a .get_state
> callback. You can then have your own implementation which adds this 4
> second delay, before chaining into xpcs_get_state() to return the true
> state.

Seems like I might need to plug something into the spot where
xpcs_resolve_pma is called. The code as it stands now is just taking
the interface and using that to determine the speed after checking the
PCS for the link. I wonder if I should look at having this actually
look at the PMA if there is one and then just determine the speed
based on that.