lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aAANe/qMWIRY2K5l@shell.armlinux.org.uk>
Date: Wed, 16 Apr 2025 21:05:15 +0100
From: "Russell King (Oracle)" <linux@...linux.org.uk>
To: Alexander Duyck <alexander.duyck@...il.com>
Cc: netdev@...r.kernel.org, andrew@...n.ch, hkallweit1@...il.com,
	davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com
Subject: Re: [net-next PATCH 2/2] net: phylink: Fix issues with link
 balancing w/ BMC present

On Wed, Apr 16, 2025 at 08:19:38PM +0100, Russell King (Oracle) wrote:
> So, when a Linux network driver probes, it starts out in administrative
> state *DOWN*. When the administrator configures the network driver,
> .ndo_open is called, which is expected to configure the network adapter.
> 
> Part of that process is to call phylink_start() as one of the last
> steps, which detects whether the link is up or not. If the link is up,
> then phylink will call netif_carrier_on() and .mac_link_up(). This
> tells the core networking layers that the network interface is now
> ready to start sending packets, and it will begin queueing packets for
> the network driver to process - not before.
> 
> Prior to .ndo_open being called, the networking layers do not expect
> traffic from the network device no matter what physical state the
> media link is in. If .ndo_open fails, the same applies - no traffic is
> expected to be passed to the core network layers from the network
> layers because as far as the network stack is concerned, the interface
> is still administratively down.
> 
> Thus, the fact that your BMC thinks that the link is up is irrelevant.
> 
> So, start off in a state that packet activity is suspended even if the
> link is up at probe time. Only start packet activity (reception and
> transmission) once .mac_link_up() has been called. Stop that activity
> when .mac_link_down() is subsequently called.
> 
> There have been lots of NICs out there where the physical link doesn't
> follow the adminstrative state of the network interface. This is not a
> problem. It may be desirable that it does, but a desire is not the same
> as "it must".

Let me be crystal clear on this.

Phylink has a contract with all existing users. That contract is:

Initial state: link down.

Driver calls phylink_start() in its .ndo_open method.

Phylink does configuration of the PHY and link according to the
chosen link parameters by calling into the MAC, PCS, and phylib as
appropriate.

If the link is then discovered to be up (it might have been already
up before phylink_start() was called), phylink will call the various
components such as PCS and MAC to inform them that the link is now up.
This will mean calling the .mac_link_up() method. Otherwise (if the
link is discovered to be down when the interface is brought up) no
call to either .mac_link_up() nor .mac_link_down() will be made.

If the link _subsequently_ goes down, then phylink deals with that
and calls .mac_link_down() - only if .mac_link_up() was previously
called (that's one of the bugs you discovered, that on resume it
gets called anyway. I've submitted a fix for that contract breach,
which only affects a very small number of drivers - stmmac, ucc_geth
and your fbnic out of 22 total ethernet users plus however many DSA
users we have.)

Only if .mac_link_down() has been called, if the link subsequently
comes back up, then the same process happens as before resulting in
.mac_link_up() being called.

If the interface is taken down, then .mac_link_down() will be called
if and only if .mac_link_up() had been called.

The ordering of .mac_link_up() / .mac_link_down() is a strict
contract term with phylink users.

The reason for this contract: phylink users may have ordering
requirements.

For example, on mac_link_down(), they may wait for packet activity to
stop, and then place the MAC in reset. If called without a previous
.mac_link_up call, the wait stage may time out due to the MAC being
in reset. (Ocelot may suffer with this.)

Another example is fs_enet which also relies on this strict ordering
as described above.

There could be others - there are some that call into firmware on
calls to .mac_link_up() / .mac_link_down() and misordering those
depends on what the firmware is doing, which we have no visibility
of.

As I stated, this is the contract that phylink gave to users, and
the contract still stands, and can't be broken to behave differently
(e.g. calling .mac_link_down() after phylink_start() without an
intervening call to .mac_link_up()) otherwise existing users will
break. Bugs that go against that contract will be fixed, but the
contract will not be intentionally broken.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ