netdev - Re: [PATCH net-next 2/2] net: enetc: count the tc-taprio window drops

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20220511175208.1be804ac@kernel.org>
Date:   Wed, 11 May 2022 17:52:08 -0700
From:   Jakub Kicinski <kuba@...nel.org>
To:     Vladimir Oltean <vladimir.oltean@....com>
Cc:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Paolo Abeni <pabeni@...hat.com>,
        Eric Dumazet <edumazet@...gle.com>,
        Claudiu Manoil <claudiu.manoil@....com>,
        Vinicius Costa Gomes <vinicius.gomes@...el.com>,
        Michael Walle <michael@...le.cc>,
        Xiaoliang Yang <xiaoliang.yang_1@....com>,
        Po Liu <po.liu@....com>
Subject: Re: [PATCH net-next 2/2] net: enetc: count the tc-taprio window
 drops

On Thu, 12 May 2022 00:20:18 +0000 Vladimir Oltean wrote:
> > We can argue semantics but there doesn't need to be a "standards body"
> > to add a structured stat in ethtool [1]. When next gen of enetc comes
> > out you'll likely try to use the same stat name or reuse the entire
> > driver. So you are already defining uAPI for your users, it's only
> > a question of scope at which the uAPI is defined.  
> 
> The trouble with over-standardization is that with a different driver
> that would use this ad-hoc structure for parts of it, you never know if
> a counter is 0 because it's 0 or because it's not implemented.
> As unstructured as the plain ethtool -S might be, at least if you see a
> counter there, you can expect that it actually counts something.

That's solved with the netlink ethtool stats. What's not repored by the
driver is not reported to user space. Grep for ETHTOOL_STAT_NOT_SET.
Maybe not beautiful but works.

> > What I'm not sure of is what to attach that statistic to. You have it
> > per ring and we famously don't have per ring APIs, so whatever, let
> > me apply as is and move on :)  
> 
> It would probably have to be per traffic class, since the media
> reservation gates are per traffic class (TX rings have a configurable
> mapping with traffic classes). Although an aggregate counter would also
> be plausible. Who knows?

Well, users sometimes know what they want but the days when the kernel
was written by its users are long gone. Or maybe that's just a perfect
example of the "good old days" fallacy :)

> I haven't seen this specific counter being reported by the LS1028A
> switch, for example (I'll have to check what increments on blocked
> transmission overruns).
> 
> > [1] Coincidentally I plan to add a "real link loss" statistic there
> > because AFAICR IEEE doesn't have a stat for it, and carrier_changes
> > count software events so it's meaningless to teams trying to track
> > cable issues.  
> 
> I didn't quite get what's wrong with the carrier_changes sysfs
> counter, and how "real link loss" would be implemented
> differently/more usefully? At least with phylib/phylink users,
> netif_carrier_on() + netif_carrier_off() are called exactly on
> phydev->phy_link_change() events. Are there other callers of
> netif_carrier_*() that pollute this counter and make it useless for
> reliable debugging?

Yup, drivers will do a netif_carrier_off() to stop Tx and prevent
the timeout watchdog from kicking in while they are doing some form 
of reconfig (ethtool -L / -G etc.).

I guess we can add a special API for taking things down without
bumping the counter. Since drivers I work with already report an
ethtool -S stat from the device for "PHY really went down" my first
instinct was a better ethtool stat...