[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c7c7aee2-5fda-4b66-a337-afb028791f9c@lunn.ch>
Date: Wed, 23 Apr 2025 00:26:12 +0200
From: Andrew Lunn <andrew@...n.ch>
To: Alexander Duyck <alexander.duyck@...il.com>
Cc: Jakub Kicinski <kuba@...nel.org>, netdev@...r.kernel.org,
linux@...linux.org.uk, hkallweit1@...il.com, davem@...emloft.net,
pabeni@...hat.com
Subject: Re: [net-next PATCH 0/2] net: phylink: Fix issue w/ BMC link flap
On Tue, Apr 22, 2025 at 02:29:48PM -0700, Alexander Duyck wrote:
> On Tue, Apr 22, 2025 at 9:50 AM Andrew Lunn <andrew@...n.ch> wrote:
> >
> > > > The whole concept of a multi-host NIC is new to me. So i at least need
> > > > to get up to speed with it. I've no idea if Russell has come across it
> > > > before, since it is not a SoC concept.
> > > >
> > > > I don't really want to agree to anything until i do have that concept
> > > > understood. That is part of why i asked about a standard. It is a
> > > > dense document answering a lot of questions. Without a standard, i
> > > > need to ask a lot of questions.
> > >
> > > Don't hesitate to ask the questions, your last reply contains no
> > > question marks :)
> >
> > O.K. Lets start with the basics. I assume the NIC has a PCIe connector
> > something like a 4.0 x4? Each of the four hosts in the system
> > contribute one PCIe lane. So from the host side it looks like a 4.0 x1
> > NIC?
>
> More like 5.0 x16 split in to 4 5.0 x4 NICs.
O.K. Same thing, different scale.
> > There are not 4 host MACs connected to a 5 port switch. Rather, each
> > host gets its own subset of queues, DMA engines etc, for one shared
> > MAC. Below the MAC you have all the usual PCS, SFP cage, gpios, I2C
> > bus, and blinky LEDs. Plus you have the BMC connected via an RMII like
> > interface.
>
> Yeah, that is the setup so far. Basically we are using one QSFP cable
> and slicing it up. So instead of having a 100CR4 connection we might
> have 2x50CR2 operating on the same cable, or 4x25CR.
But for 2x50CR2 you have two MACs? And for 4x25CR 4 MACs?
Or is there always 4 MACs, each MAC has its own queues, and you need
to place frames into the correct queue, and with a 2x50CR2 you also
need to load balance across those two queues?
I guess the queuing does not matter much to phylink, but how do you
represent multiple PCS lanes to phylink? Up until now, one netdev has
had one PCS lane. It now has 1, 2, or 4 lanes. None of the
phylink_pcs_op have a lane indicator.
> > NC-SI, with Linux controlling the hardware, implies you need to be
> > able to hand off control of the GPIOs, I2C, PCS to Linux. But with
> > multi-host, it makes no sense for all 4 hosts to be trying to control
> > the GPIOs, I2C, PCS, perform SFP firmware upgrade. So it seems more
> > likely to me, one host gets put in change of everything below the
> > queues to the MAC. The others just know there is link, nothing more.
>
> Things are a bit simpler than that. With the direct-attach we don't
> need to take any action on the SFP. Essentially the I2C and GPIOs are
> all shared. As such we can read the QSFP state, but cannot modify it
> directly. We aren't taking any actions to write to the I2C other than
> bank/page which is handled all as a part of the read call.
That might work for direct-attach, but what about the general case? We
need to ensure whatever we add supports the general case.
The current SFP code expects a Linux I2C bus. Given how SFPs are
broken, it does 16 bytes reads at the most. When it needs to read more
than 16 bytes, i expect it will set the page once, read it back to
ensure the SFP actually implements the page, and then do multiple I2C
reads to read all the data it wants from that page. I don't see how
this is going to work when the I2C bus is shared.
> > This actually circles back to the discussion about fixed-link. The one
> > host in control of all the lower hardware has the complete
> > picture. The other 3 maybe just need a fixed link. They don't get to
> > see what is going on below the MAC, and as a result there is no
> > ethtool support to change anything, and so no conflicting
> > configuration? And since they cannot control any of that, they cannot
> > put the link down. So 3/4 of the problem is solved.
>
> Yeah, this is why I was headed down that path for a bit. However our
> links are independent with the only shared bit being the PMD and the
> SFP module.
Yours might be, but what is the general case?
Andrew
Powered by blists - more mailing lists