[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <06490a1a-427c-4e35-b9c3-154a0c88ed60@lunn.ch>
Date: Sun, 20 Apr 2025 23:58:52 +0200
From: Andrew Lunn <andrew@...n.ch>
To: Alexander Duyck <alexander.duyck@...il.com>
Cc: netdev@...r.kernel.org, linux@...linux.org.uk, hkallweit1@...il.com,
davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com
Subject: Re: [net-next PATCH 0/2] net: phylink: Fix issue w/ BMC link flap
> > The actual link settings are controlled by the host NC driver when
> > it is operational. When the host NC driver is operational, link
> > settings specified by the MC using the Set Link command may be
> > overwritten by the host NC driver. The link settings are not
> > restored by the NC if the host NC driver becomes non
> > operational.
> >
> > There is a very clear indication that the host is in control, or the
> > host is not in control. So one obvious question to me is, should
> > phylink have ops into the MAC driver to say it is taking over control,
> > and relinquishing control? The Linux model is that when the interface
> > is admin down, you can use ethtool to preconfigure things, but they
> > don't take affect until the link is admin up. So with admin down, we
> > have a host NC driver, but it is not operational, hence the Network
> > Controller is in control of the link at the Management Controllers
> > bequest. It is only with admin up that phylink takes control of the
> > Network Controller, and it releases it with admin down. Having these
> > ops would also help with suspend/resume. Suspend does not change the
> > admin up/down status, but the host clearly needs to hand over control
> > of the media to the Network Controller, and take it back again on
> > resume.
>
> Yes, this more-or-less describes the current setup in fbnic. The only
> piece that is probably missing would be the heartbeat we maintain so
> that the NIC doesn't revoke access due to the OS/driver potentially
> being hung.
That probably goes against the last sentence i quoted above. I do
however understand why you would want it. Can the host driver know if
the Network Controller has taken back control? Or does the heartbeat
also act as a watchdog, the host does not need to care, it is about to
experience a BMC induced reboot?
> The other thing involved in this that you didn't mention
> is that the MC is also managing the Rx filter configuration. So when
> we take ownership it is both the Rx Filters and MAC/PCS/PHY that we
> are taking ownership of.
That does not seem consistent with the standard. The Set Link command
i quoted above makes it clear that when the host driver is active, it
is in control of the media. However the Set VLAN Filter command,
Enable VLAN command, Set MAC Address command, Enable Broadcast Filter
command, say nothing about differences when the Host driver is
operational or not. It just seems to assume the Management Controller
and the host share the resources, and try not to stomp over each
other. Does fbnic not follow the standard in this respect? However,
from a phylink perspective, i don't think this matters, phylink is not
involved with any of this.
> The current pattern in fbnic is for us to do most of this on the tail
> end of __fbnic_open and unwind it near the start of fbnic_stop.
> Essentially the pattern is xmit_ownership, init heartbeat, init PTP,
> start phylink, configure Rx filters. In the case of close it is the
> reverse with us tearing down the filters, disabling phylink, disabling
> PTP, and then releasing ownership.
>
> > Also, if we have these ops, we know that admin down/suspend does not
> > mean media down. The presence of these ops triggers different state
> > transitions in the phylink state machine so that it simply hands off
> > control of the media, but otherwise leaves it alone.
> >
> > With this in place, i think we can avoid all the unbalanced state?
>
> As I understand it right now the main issue is that Phylink assumes
> that it has to take the link down in order to perform a major
> configuration in phylink_start/phylink_resume.
Well, as i said, my reading of the standard is that the host can make
disruptive media changes, so you have to be able to live with
disruptive media changes. If you have to live with it, the path of
lease resistance is just to accept it.
> The requirement that the BMC not lose link comes more out of the
> multi-host setups that have been in place in the data center
> environment for the last decade or so where there was only one link
> but multiple systems all sharing that link, including the BMC. So it
> is not strictly a BMC requirement, but more of a multi-host
> requirement.
Is this actually standardised somewhere? I see there is a draft of an
update to NC-SI Specification, but i don't think the section about
controlling the link has changed. Also, the standard talks about how
you connect one Management Controller to multiple Network
Controllers. There is nothing about multiple Management Controllers
connected to one Network Controller. Or i'm i missing something, like
one Management Controller is controlling all the host connected to the
Network Controller?
> > So, can we ignore the weeds for the moment, and think about the big
> > picture?
>
> So big picture wise we really have 2 issues:
> 1. The BMC handling doesn't currently exist, so we need to extend
> handling/hand-off for link up before we start, and link up after we
> stop.
Agreed, and that fits with DSP0222.
> 2. Expectations for our 25G+ interfaces to behave like multi-host NICs
> that are sharing a link via firmware. Specifically that
> loading/unloading the driver or ifconfig up/down on the host interface
> should not cause the link to bounce and/or drop packets for any other
> connections, which in this case includes the BMC.
For this, it would be nice to point to some standard which describes
this, so we have a generic, vendor agnostic, description of how this
is supposed to work.
Andrew
Powered by blists - more mailing lists