netdev - Re: [RFC net-next] net: track locally triggered link loss

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220520150256.5d9aed65@kernel.org>
Date:   Fri, 20 May 2022 15:02:56 -0700
From:   Jakub Kicinski <kuba@...nel.org>
To:     Andrew Lunn <andrew@...n.ch>
Cc:     netdev@...r.kernel.org, linux@...linux.org.uk, olteanv@...il.com,
        hkallweit1@...il.com, f.fainelli@...il.com, saeedm@...dia.com,
        michael.chan@...adcom.com
Subject: Re: [RFC net-next] net: track locally triggered link loss

On Fri, 20 May 2022 20:48:20 +0200 Andrew Lunn wrote:
> > I was looking at bnxt because it's relatively standard for DC NICs and
> > doesn't have 10M lines of code.. then again I could be misinterpreting
> > the code, I haven't tested this theory:
> > 
> > In bnxt_set_pauseparam() for example the driver will send a request to
> > the FW which will result in the link coming down and back up with
> > different settings (e.g. when pause autoneg was changed). Since the
> > driver doesn't call netif_carrier_off() explicitly as part of sending
> > the FW message but the link down gets reported thru the usual interrupt
> > (as if someone yanked the cable out) - we need to wrap the FW call with
> > the __LINK_STATE_NOCARRIER_LOCAL  
> 
> I'm not sure this is a good example. If the PHY is doing an autoneg,
> the link really is down for around a second. The link peer will also
> so the link go down and come back up. So this seems like a legitimate
> time to set the carrier off and then back on again.

In the commit message I differentiated between link flaps type (1), 
(2), (3a) and (3b). What we're talking about here is (1) vs (2), and
from the POV of the remote end (3a) vs (3b).

For a system which wants to monitor link quality on the local end =>
i.e. whether physical hardware has to be replaced - differentiating
between (1) and (2) doesn't really matter, they are both non-events.

I admitted somewhere in the commit message that with just the "locally
triggered" vs "non-locally triggered" we still can't tell (1) from (2)
and therefore won't be able to count true "link went down because of
signal integrity" events. Telling (1) from (2) should be easier with
phylib than with FW-managed PHYs. I left it for future work because:

 - in DC environments PHYs are never managed by Linux AFAIK, sadly,
   and unless I convince vendors to do the conversions I'm likely going
   to get the counting between (1) and (2) wrong, not having access to
   FW or any docs;

 - switches don't flap links much, while NIC reconfig can easily produce
   spikes of 5+ carrier changes in close succession so even without
   telling (1) from (2) we can increase the signal of the monitoring
   significantly

I'm happy to add the (1) vs (2) API tho if it's useful, what I'm
explaining is more why I don't feel its useful for my case.

> > > The driver has a few netif_carrier_off() calls changed to
> > > netif_carrier_admin_off(). It is then unclear looking at the code
> > > which of the calls to netif_carrier_on() match the off.  
> > 
> > Right, for bnxt again the carrier_off in bnxt_tx_disable() would become
> > an admin_carrier_off, since it's basically part of closing the netdev.  
> 
> > > Maybe include a driver which makes use of phylib, which should be
> > > doing control of the carrier based on the actual link status.  
> > 
> > For phylib I was thinking of modifying phy_stop()... but I can't
> > grep out where carrier_off gets called. I'll take a closer look.  
> 
> If the driver is calling phy_stop() the link will go down. So again, i
> would say setting the carrier off is correct. If the driver calls
> phy_start() an auto neg is likely to happen and 1 second later the
> link will come up.
> 
> Maybe i'm not understanding what you are trying to count here. If the
> MAC driver needs to stop the MAC in order to reallocate buffers with
> different MTU, or more rings etc, then i can understand not wanting to
> count that as a carrier off, because the carrier does not actually go
> off. But if it is in fact marking the carrier off, it sounds like a
> MAC driver bug, or a firmware bug.

Well, either way the carrier is set to off because of all the calls to
netif_carrier_ok() calls throughout the stack. I'm afraid to change the
semantics of that.

What I want to count is _in_addition_ to whether the link went down or
not - whether the link down was due to local administrative action.

Then user space can do:

	remote_flaps = carrier_down - carrier_down_local