netdev - Re: [PATCH net-next 2/2] net: enetc: count the tc-taprio window drops

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220512002017.qxhyc5vautnrakni@skbuf>
Date:   Thu, 12 May 2022 00:20:18 +0000
From:   Vladimir Oltean <vladimir.oltean@....com>
To:     Jakub Kicinski <kuba@...nel.org>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Paolo Abeni <pabeni@...hat.com>,
        Eric Dumazet <edumazet@...gle.com>,
        Claudiu Manoil <claudiu.manoil@....com>,
        Vinicius Costa Gomes <vinicius.gomes@...el.com>,
        Michael Walle <michael@...le.cc>,
        Xiaoliang Yang <xiaoliang.yang_1@....com>,
        Po Liu <po.liu@....com>
Subject: Re: [PATCH net-next 2/2] net: enetc: count the tc-taprio window drops

On Wed, May 11, 2022 at 04:36:55PM -0700, Jakub Kicinski wrote:
> On Wed, 11 May 2022 23:17:46 +0000 Vladimir Oltean wrote:
> > On Wed, May 11, 2022 at 04:13:46PM -0700, Jakub Kicinski wrote:
> > > On Wed, 11 May 2022 22:57:46 +0000 Vladimir Oltean wrote:
> > > > The only entry that is a counter in the Scheduled Traffic MIB is TransmissionOverrun,
> > > > but that isn't what this is. Instead, this would be a TransmissionOverrunAvoidedByDropping,
> > > > for which there appears to be no standardization.
> > >
> > > TransmissionOversized? There's no standardization in terms of IEEE but
> > > the semantics seem pretty clear right? The packet is longer than the
> > > entire window so it can never go out?
> >
> > Yes, so what are you saying? Become the ad-hoc standards body for
> > scheduled traffic?
>
> We can argue semantics but there doesn't need to be a "standards body"
> to add a structured stat in ethtool [1]. When next gen of enetc comes
> out you'll likely try to use the same stat name or reuse the entire
> driver. So you are already defining uAPI for your users, it's only
> a question of scope at which the uAPI is defined.

The trouble with over-standardization is that with a different driver
that would use this ad-hoc structure for parts of it, you never know if
a counter is 0 because it's 0 or because it's not implemented.
As unstructured as the plain ethtool -S might be, at least if you see a
counter there, you can expect that it actually counts something.

> What I'm not sure of is what to attach that statistic to. You have it
> per ring and we famously don't have per ring APIs, so whatever, let
> me apply as is and move on :)

It would probably have to be per traffic class, since the media
reservation gates are per traffic class (TX rings have a configurable
mapping with traffic classes). Although an aggregate counter would also
be plausible. Who knows? I haven't seen this specific counter being
reported by the LS1028A switch, for example (I'll have to check what
increments on blocked transmission overruns).

> [1] Coincidentally I plan to add a "real link loss" statistic there
> because AFAICR IEEE doesn't have a stat for it, and carrier_changes
> count software events so it's meaningless to teams trying to track
> cable issues.

I didn't quite get what's wrong with the carrier_changes sysfs counter,
and how "real link loss" would be implemented differently/more usefully?
At least with phylib/phylink users, netif_carrier_on() + netif_carrier_off()
are called exactly on phydev->phy_link_change() events.
Are there other callers of netif_carrier_*() that pollute this counter
and make it useless for reliable debugging?