lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0UcgZpE4CMDmnR2V2GTz3tyd+atU-mgMqiHesZVXN8F_+g@mail.gmail.com>
Date: Wed, 16 Apr 2025 12:03:05 -0700
From: Alexander Duyck <alexander.duyck@...il.com>
To: "Russell King (Oracle)" <linux@...linux.org.uk>
Cc: netdev@...r.kernel.org, andrew@...n.ch, hkallweit1@...il.com, 
	davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com
Subject: Re: [net-next PATCH 2/2] net: phylink: Fix issues with link balancing
 w/ BMC present

On Wed, Apr 16, 2025 at 9:01 AM Russell King (Oracle)
<linux@...linux.org.uk> wrote:
>
> On Wed, Apr 16, 2025 at 08:29:00AM -0700, Alexander Duyck wrote:
> > From: Alexander Duyck <alexanderduyck@...com>
> >
> > This change is meant to address the fact that there are link imbalances
> > introduced when using phylink on a system with a BMC. Specifically there
> > are two issues.
> >
> > The first issue is that if we lose link after the first call to
> > phylink_start but before it gets to the phylink_resolve we will end up with
> > the phylink interface assuming the link was always down and not calling
> > phylink_link_down resulting in a stuck interface.
>
> That is intentional.
>
> phylink strictly orders .mac_link_down and .mac_link_up, and starts from
> an initial position that the link _will_ be considered to be down. So,
> it is intentional that .mac_link_down will _never_ be called after
> phylink_start().

Well the issue is that with a BMC present the link may be up before we
even start using phylink. So if the link is lost while we are going
through phylink_start we will end up in an in-between state where the
link is physically down, but the MAC is still configured as though the
link is up. This will be problematic as the MAC should essentially be
discarding frames for transmit if the link is down to avoid blocking
internal Tx FIFOs.

Everything for our driver has to be a light touch as bringing down the
BMC link for any reason other than the physical link being lost is
essentially a criteria for failure as the BMC is the most essential
link on the system. So, for example the code I have in the driver for
handling a major config is currently checking in mac_prepare to verify
if we already have link based on the requested interface mode and FEC
settings and if we do the mac_config and pcs_config steps read an
internal state flag and do nothing, then we just go through to
mac_finish where we write the necessary bits to make sure the PCS/PMA
and PMA/PMD connections are enabled which essentially becomes a no-op
if the link is already enabled.

> > The second issue is that when a BMC is present we are currently forcing the
> > link down. This results in us bouncing the link for a fraction of a second
> > and that will result in dropped packets for the BMC.
>
> ... but you don't explain how that happens.

It was right there in the patch. It was the lines I removed:
@@ -2596,16 +2600,6 @@ void phylink_resume(struct phylink *pl)
        if (test_bit(PHYLINK_DISABLE_MAC_WOL, &pl->phylink_disable_state)) {
                /* Wake-on-Lan enabled, MAC handling */

-               /* Call mac_link_down() so we keep the overall state balanced.
-                * Do this under the state_mutex lock for consistency. This
-                * will cause a "Link Down" message to be printed during
-                * resume, which is harmless - the true link state will be
-                * printed when we run a resolve.
-                */
-               mutex_lock(&pl->state_mutex);
-               phylink_link_down(pl);
-               mutex_unlock(&pl->state_mutex);
-
                /* Re-apply the link parameters so that all the settings get
                 * restored to the MAC.
                 */

>From a BMC perspective this forcing of the link down even if it is for
a fraction of a second is unacceptable as it can break up ssh sessions
to the BMC, especially if somebody is doing a bunch of configuration
changes on the NIC as it results in dropped frames. When compared to
the firmware based NIC approaches such as Broadcom and Nvidia/Mellanox
it is a huge negative as the BMC link is static with those NICs and
doesn't bounce no matter if the interface is being configured up/down,
the driver proved/removed ect.

The issue, even with your recent patch, is that it will still force
the link down if the link was previously up. That is the piece I need
to avoid to prevent the BMC from losing link. Ideally what I need is
to have a check of the current link state and then sync back up rather
than force the phylink state on the MAC and then clean things up after
the fact.

> > The third issue is just an extra "Link Down" message that is seen when
> > calling phylink_resume. This is addressed by identifying that the link
> > isn't balanced and just not displaying the down message in such a case.
>
> Hmm, this one is an error, but is not as simple as "don't print the
> message" as it results in a violation of the rule I mentioned above.
> We need phylink_suspend() to record the state of the link at that
> point, and avoid calling phylink_link_down() if the link was down
> prior to suspend.

Looking at your fix it doesn't resolve my core issue. All it does is
address the "third issue".

In my case the link is up and needs to stay up when I resume. The fact
that we are forcing the link down causes our BMC to drop connection
which makes it difficult to remotely manage the interface as changes
to the interface such as a simple ifconfig down; ifconfig up are
enough to cause the BMC to drop packets and potentially the ssh
session if we start losing enough of them.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ