lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e3305a73-6a18-409b-a782-a89702e43a80@lunn.ch>
Date: Tue, 22 Apr 2025 15:49:02 +0200
From: Andrew Lunn <andrew@...n.ch>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Alexander Duyck <alexander.duyck@...il.com>, netdev@...r.kernel.org,
	linux@...linux.org.uk, hkallweit1@...il.com, davem@...emloft.net,
	pabeni@...hat.com
Subject: Re: [net-next PATCH 0/2] net: phylink: Fix issue w/ BMC link flap

On Mon, Apr 21, 2025 at 06:21:43PM -0700, Jakub Kicinski wrote:
> On Mon, 21 Apr 2025 09:50:25 -0700 Alexander Duyck wrote:
> > On Mon, Apr 21, 2025 at 8:51 AM Alexander Duyck wrote:
> > > On Sun, Apr 20, 2025 at 2:58 PM Andrew Lunn <andrew@...n.ch> wrote:  
> > > > > 2. Expectations for our 25G+ interfaces to behave like multi-host NICs
> > > > > that are sharing a link via firmware. Specifically that
> > > > > loading/unloading the driver or ifconfig up/down on the host interface
> > > > > should not cause the link to bounce and/or drop packets for any other
> > > > > connections, which in this case includes the BMC.  
> > > >
> > > > For this, it would be nice to point to some standard which describes
> > > > this, so we have a generic, vendor agnostic, description of how this
> > > > is supposed to work.
> > >
> > > The problem here is this is more-or-less a bit of a "wild west" in
> > > terms of the spec setup. From what I can tell OCP 3.0 defines how to
> > > set up the PCIe bifurcation but doesn't explain what the expected
> > > behavior is for the shared ports. One thing we might look into would
> > > be the handling for VEPA(Virtual Ethernet Port Aggregator) or VEB
> > > (Virtual Ethernet Bridging) as that wouldn't be too far off from what
> > > inspired most of the logic in the hardware. Essentially the only
> > > difference is that instead of supporting VFs most of these NICs are
> > > supporting multiple PFs.  
> > 
> > So looking at 802.1Q-2022 section 40 I wonder if we don't need to
> > essentially define ourselves as an edge relay as our setup is pretty
> > close to what is depicted in figure 40-1. In our case an S-channel
> > essentially represents 2 SerDes lanes on an QSFP cable, with the
> > switch playing the role of the EVB bridge.
> > 
> > Anyway I think that is probably the spec we need to dig into if we are
> > looking for how the link is being shared and such. I'll try to do some
> > more reading myself to get caught up on all this as the last time I
> > had been reading through this it was called VEB instead of EVB.. :-/
> 
> Interesting. My gut feeling is that even if we make Linux and the NIC
> behave nicely according to 802.1Q, we'll also need to make some changes
> on the BMC side. And there we may encounter pushback as the status quo
> works quite trivially for devices with PHY control in FW.

As i see it, we have two things stacked on top of each other. We have
what is standardised for NC-SI, DSP0222. That gives a basis, and then
there is vendor stuff on top for multi-host, which is more strict.

Linux should have generic support for DSP0222. I've seen vendors hack
around with WoL to make it work. It would be nice to replace that hack
with a method to tell phylink to enable support for DSP0222. A
standardised method, since as additional ops, or a flag. phylink can
then separate admin down from carrier down when needed.

Then we have vendor stuff on top. 

> BTW Saeed posted a devlink param to "keep link up" recently:
> https://lore.kernel.org/all/20250414195959.1375031-11-saeed@kernel.org/
> Intel has ethtool priv flags to the same effect, in their 40G and 100G
> drivers, but with reverse polarity:
> https://docs.kernel.org/networking/device_drivers/ethernet/intel/i40e.html#setting-the-link-down-on-close-private-flag
> These are all for this exact use case. In the past Ido added module
> power policy, which is the only truly generic configurable, and one we
> should probably build upon:
> https://docs.kernel.org/networking/ethtool-netlink.html#c.ethtool_module_power_mode_policy
> I'm not sure if this is expected to include PCS or it's just telling
> the module to keep the laser on..

Ideally, we want to define something vendor agnostic. And i would
prefer we talk about the high level concept, sharing the NIC with a
BMC and multiple hosts, rather than the low level, keep link up.

The whole concept of a multi-host NIC is new to me. So i at least need
to get up to speed with it. I've no idea if Russell has come across it
before, since it is not a SoC concept.

I don't really want to agree to anything until i do have that concept
understood. That is part of why i asked about a standard. It is a
dense document answering a lot of questions. Without a standard, i
need to ask a lot of questions.

I also think there is a lot more to it than just keeping the laser
on. For NC-SI, DSP0222 that probably does cover a big chunk of the
problem, but for multi-host, my gut is telling me there is more to it.

Let me do some research and thinking about multi-host.

	Andrew

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ