netdev - Re: [PATCH net] net: phylink: fix suspend/resume with WoL enabled and link down

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0Uf2a48D7O_OSFV8W7j3DJjn_patFbjRbvktazt9UTKoLQ@mail.gmail.com>
Date: Thu, 17 Apr 2025 10:06:45 -0700
From: Alexander Duyck <alexander.duyck@...il.com>
To: "Russell King (Oracle)" <linux@...linux.org.uk>
Cc: Andrew Lunn <andrew@...n.ch>, Heiner Kallweit <hkallweit1@...il.com>, 
	"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, 
	Jakub Kicinski <kuba@...nel.org>, Joakim Zhang <qiangqing.zhang@....com>, netdev@...r.kernel.org, 
	Paolo Abeni <pabeni@...hat.com>
Subject: Re: [PATCH net] net: phylink: fix suspend/resume with WoL enabled and
 link down

On Thu, Apr 17, 2025 at 8:23 AM Russell King (Oracle)
<linux@...linux.org.uk> wrote:
>
> On Thu, Apr 17, 2025 at 03:35:56PM +0100, Russell King (Oracle) wrote:
> > On Thu, Apr 17, 2025 at 07:30:05AM -0700, Alexander H Duyck wrote:
> > > This is the only spot where we weren't setting netif_carrier_on/off and
> > > old_link_state together. I suspect you could just carry old_link_state
> > > without needing to add a new argument. Basically you would just need to
> > > drop the "else" portion of this statement.
> > >
> > > In the grand scheme of things with the exception of this one spot
> > > old_link_state is essentially the actual MAC/PCS link state whereas
> > > netif_carrier_off is the administrative state.
> >
> > Sorry to say, but you have that wrong. Neither are the administrative
> > state.
>
> To add to this (sorry, I rushed that reply), old_link_state is used when
> we aren't bound to a network device, otherwise the netif carrier state
> is used. Changes in the netif carrier state provoke netlink messages to
> userspace to inform userspace of changes to the link state.
>
> Moreover, it controls whether the network stack queues packets for
> transmission to the driver, and also whether the transmit watchdog is
> allowed to time out. It _probably_ also lets the packet schedulers
> know that the link state has changed.
>
> So, the netif carrier state is not "administrative".

I have always sort of assumed that netif_carrier_on/off was the
logical AND of the administrative state of the NIC and the state of
the actual MAC/PCS/PHY. That is why I refer to it as an administrative
state. You end up with ifconfig up/down toggling netif carrier even if
the underlying link state hasn't changed. This has been the case for
many of the high end NICs for a while now, especially for the
multi-host ones, as the firmware won't change the actual physical
state of the link. It leaves it in place while the NIC port on the
host is no longer active.

> It isn't strictly necessary for old_link_state to remain in sync with
> the netif carrier state, but it is desirable to avoid errors - but
> obviously the netif carrier state only exists when we're bound to a
> network device.

>From what I can tell this is the only spot where the two diverge,
prior to this patch. The general thought I was having was to update
the netif_carrier state in the suspend, and then old_link_state in the
resume. I realize now the concern is that setting the
phylink_disable_state is essentially the same as setting the link
down. So really we have 3 states we are tracking,
netif_carrier_ok/old_link_state, phylink_disable_state, and if we want
the link state to change while disabled. So we do need one additional
piece of state in the event that there isn't a netdev in order to
handle the case where "pl->phylink_disable_state &&
!!pl->phylink_disable_state == pl->old_link_state".

> > > > -         /* Call mac_link_down() so we keep the overall state balanced.
> > > > -          * Do this under the state_mutex lock for consistency. This
> > > > -          * will cause a "Link Down" message to be printed during
> > > > -          * resume, which is harmless - the true link state will be
> > > > -          * printed when we run a resolve.
> > > > -          */
> > > > -         mutex_lock(&pl->state_mutex);
> > > > -         phylink_link_down(pl);
> > > > -         mutex_unlock(&pl->state_mutex);
> > > > +         if (pl->suspend_link_up) {
> > > > +                 /* Call mac_link_down() so we keep the overall state
> > > > +                  * balanced. Do this under the state_mutex lock for
> > > > +                  * consistency. This will cause a "Link Down" message
> > > > +                  * to be printed during resume, which is harmless -
> > > > +                  * the true link state will be printed when we run a
> > > > +                  * resolve.
> > > > +                  */
> > > > +                 mutex_lock(&pl->state_mutex);
> > > > +                 phylink_link_down(pl);
> > > > +                 mutex_unlock(&pl->state_mutex);
> > > > +         }
> > >
> > > You should be able to do all of this with just old_link_state. The only
> > > thing that would have to change is that you would need to set
> > > old_link_state to false after the if statement.
> >
> > Nope.
>
> And still nope - what we need to know here is what was the link state
> _before_ we called netif_carrier_off() or set old_link_state to false.
>
> I somehow suspect that you don't understand what all this code is trying
> to do, so let me explain it.
>
> In the suspend function, when WoL is enabled, and we're using MAC-based
> WoL, we need the link to the MAC to remain up, so it can receive packets
> to check whether they are the appropriate wake packet. However, we don't
> want packets to be queued for transmission either.

That is the same as what we are looking for. We aren't queueing any
packets for transmission ourselves. We just want to leave the MAC
enabled, however we also want to leave it enabled when we resume.

> So, we have the case where we want to avoid notifying the MAC that the
> link has gone down, but we also want to call netif_carrier_off() to stop
> the network stack queueing packets.
>
> To achieve this, several things work in unison:
>
> - we take the state mutex to prevent the resolver from running while we
>   fiddle with various state.
> - we disable the resolver (which, if that's the only thing that happens,
>   will provoke the resolver to take the link down.)
> - we lie to the resolver about the link state, by calling
>   netif_carrier_off() and/or setting old_link_state to false. This
>   means the resolver believes the link is _already_ down, and thus
>   because of the strict ordering I've previously mentioned, will *not*
>   call mac_link_down().
> - we release the lock.
>
> There is now no way that the resolver will call either mac_link_up() or
> mac_link_down() - which is what we want here.

The part that I think I missed here was that if we set
phylink_disable_state we then set link_state.link to false in
phylink_resolve. I wonder if we couldn't just have a flag that sets
cur_link_state to false in the "if(pl->phylink_disable_state)" case
and simplify this to indicate we won't force the link down"

> When we resume, we need to unwind this, but taking care that the network
> layers may need to be reprogrammed. That is done by "completing" the
> link-down that was done in the suspend function by calling
> mac_link_down() (which should only have been done if the link wasn't
> already down - hence depends on state _prior_ to the modifications that
> phylink_suspend() made.)
>
> It can then re-setup the MAC/PCS by calling the config functions, and
> then release the resolver to then make a decision about the state of
> the link.
>
> With the fix I posted, this guarantees that that the contract I talked
> about previously is maintained.
>
> I'm sorry that this doesn't "fit" your special case, but this is what
> all users so far expect from phylink.

I wasn't trying to "fit" it so much as understand it. So this worked
before because you were clearing either netif_carrier_off or
old_link_state and the resolve function followed the same logic. The
piece I missed is that setting the bit in phylink_disable_state is
forcing the link to resolve to false which in turn would call
mac_link_down if the old state is considered to be up and a netdev
isn't present.

> Now, how about explaining what's going on in fbnic_mac_link_up_asic()
> and fbnic_mac_link_down_asic(), and what in there triggers the BMC
> to press the panic stations do-not-press red buttin?

So fbnic_mac_link_up_asic doesn't trigger any issues. The issues are:

1. In fbnic_mac_link_down_asic we assume that if we are being told
that the link is down by phylink that it really means the link is down
and we need to shut off the MAC and flush any packets that are in the
Tx FIFO. The issue isn't so much the call itself, it is the fact that
we are being called in phylink_resume to rebalance the link that will
be going from UP->UP. The rebalancing is introducing an extra down
step. This could be resolved by adding an extra check to the line in
phylink_resume that is calling the mac_link_down so that it doesn't
get triggered and stall the link. In my test code I am now calling it
"pl->rolling_start".

2. In phylink_start/phylink_resume since we are coming at this from a
"rolling start" due to the BMC we have the MAC up and the
netif_carrier in the "off" state. As a result if we end up losing the
link between mac_prepare and pcs_get_state the MAC gets into a bad
state where netif_carrier is off, the MAC is stuck in the "up" state,
and we have stale packets clogging up the Tx FIFO.

As far as the BMC it isn't so much wanting to hit the big red button
as our platform team. Basically losing of packets is very problematic
for them, think about ssh sessions that suddenly lock up during boot,
and they can demonstrate that none of the other vendors that have the
MAC/PCS/PHY buried in their firmware have this issue. I was really
hoping to avoid going that route as the whole point of this was to
keep the code open source and visible.

> If you wish to have confirmation of what netif_carrier does, then I'm
> sure your co-maintainer and core Linux networking maintainer, Jakub,
> would be happy to help.

I'm well aware of what netif_carrier does, no need to be patronizing.
I have been doing this for over 16 years. The difference is for me the
1G stuff was 16 years ago, most of the stuff I deal with now is 25G or
higher and multi-host which is a different world with different
requirements. This is why we are having our communication issues in
understanding where each other is coming from.