lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aAEc3HZbSZTiWB8s@shell.armlinux.org.uk>
Date: Thu, 17 Apr 2025 16:23:08 +0100
From: "Russell King (Oracle)" <linux@...linux.org.uk>
To: Alexander H Duyck <alexander.duyck@...il.com>
Cc: Andrew Lunn <andrew@...n.ch>, Heiner Kallweit <hkallweit1@...il.com>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>,
	Joakim Zhang <qiangqing.zhang@....com>, netdev@...r.kernel.org,
	Paolo Abeni <pabeni@...hat.com>
Subject: Re: [PATCH net] net: phylink: fix suspend/resume with WoL enabled
 and link down

On Thu, Apr 17, 2025 at 03:35:56PM +0100, Russell King (Oracle) wrote:
> On Thu, Apr 17, 2025 at 07:30:05AM -0700, Alexander H Duyck wrote:
> > This is the only spot where we weren't setting netif_carrier_on/off and
> > old_link_state together. I suspect you could just carry old_link_state
> > without needing to add a new argument. Basically you would just need to
> > drop the "else" portion of this statement.
> > 
> > In the grand scheme of things with the exception of this one spot
> > old_link_state is essentially the actual MAC/PCS link state whereas
> > netif_carrier_off is the administrative state.
> 
> Sorry to say, but you have that wrong. Neither are the administrative
> state.

To add to this (sorry, I rushed that reply), old_link_state is used when
we aren't bound to a network device, otherwise the netif carrier state
is used. Changes in the netif carrier state provoke netlink messages to
userspace to inform userspace of changes to the link state.

Moreover, it controls whether the network stack queues packets for
transmission to the driver, and also whether the transmit watchdog is
allowed to time out. It _probably_ also lets the packet schedulers
know that the link state has changed.

So, the netif carrier state is not "administrative".

It isn't strictly necessary for old_link_state to remain in sync with
the netif carrier state, but it is desirable to avoid errors - but
obviously the netif carrier state only exists when we're bound to a
network device.

> > > -		/* Call mac_link_down() so we keep the overall state balanced.
> > > -		 * Do this under the state_mutex lock for consistency. This
> > > -		 * will cause a "Link Down" message to be printed during
> > > -		 * resume, which is harmless - the true link state will be
> > > -		 * printed when we run a resolve.
> > > -		 */
> > > -		mutex_lock(&pl->state_mutex);
> > > -		phylink_link_down(pl);
> > > -		mutex_unlock(&pl->state_mutex);
> > > +		if (pl->suspend_link_up) {
> > > +			/* Call mac_link_down() so we keep the overall state
> > > +			 * balanced. Do this under the state_mutex lock for
> > > +			 * consistency. This will cause a "Link Down" message
> > > +			 * to be printed during resume, which is harmless -
> > > +			 * the true link state will be printed when we run a
> > > +			 * resolve.
> > > +			 */
> > > +			mutex_lock(&pl->state_mutex);
> > > +			phylink_link_down(pl);
> > > +			mutex_unlock(&pl->state_mutex);
> > > +		}
> > 
> > You should be able to do all of this with just old_link_state. The only
> > thing that would have to change is that you would need to set
> > old_link_state to false after the if statement.
> 
> Nope.

And still nope - what we need to know here is what was the link state
_before_ we called netif_carrier_off() or set old_link_state to false.

I somehow suspect that you don't understand what all this code is trying
to do, so let me explain it.

In the suspend function, when WoL is enabled, and we're using MAC-based
WoL, we need the link to the MAC to remain up, so it can receive packets
to check whether they are the appropriate wake packet. However, we don't
want packets to be queued for transmission either.

So, we have the case where we want to avoid notifying the MAC that the
link has gone down, but we also want to call netif_carrier_off() to stop
the network stack queueing packets.

To achieve this, several things work in unison:

- we take the state mutex to prevent the resolver from running while we
  fiddle with various state.
- we disable the resolver (which, if that's the only thing that happens,
  will provoke the resolver to take the link down.)
- we lie to the resolver about the link state, by calling
  netif_carrier_off() and/or setting old_link_state to false. This
  means the resolver believes the link is _already_ down, and thus
  because of the strict ordering I've previously mentioned, will *not*
  call mac_link_down().
- we release the lock.

There is now no way that the resolver will call either mac_link_up() or
mac_link_down() - which is what we want here.

When we resume, we need to unwind this, but taking care that the network
layers may need to be reprogrammed. That is done by "completing" the
link-down that was done in the suspend function by calling
mac_link_down() (which should only have been done if the link wasn't
already down - hence depends on state _prior_ to the modifications that
phylink_suspend() made.)

It can then re-setup the MAC/PCS by calling the config functions, and
then release the resolver to then make a decision about the state of
the link.

With the fix I posted, this guarantees that that the contract I talked
about previously is maintained.

I'm sorry that this doesn't "fit" your special case, but this is what
all users so far expect from phylink.

Now, how about explaining what's going on in fbnic_mac_link_up_asic()
and fbnic_mac_link_down_asic(), and what in there triggers the BMC
to press the panic stations do-not-press red buttin?

If you wish to have confirmation of what netif_carrier does, then I'm
sure your co-maintainer and core Linux networking maintainer, Jakub,
would be happy to help.

Thanks.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ