[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200824121143.3b233788@kicinski-fedora-PC1C0HJN>
Date: Mon, 24 Aug 2020 12:11:43 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Ido Schimmel <idosch@...sch.org>
Cc: David Ahern <dsahern@...il.com>,
Florian Fainelli <f.fainelli@...il.com>,
netdev@...r.kernel.org, davem@...emloft.net, jiri@...dia.com,
amcohen@...dia.com, danieller@...dia.com, mlxsw@...dia.com,
roopa@...dia.com, andrew@...n.ch, vivien.didelot@...il.com,
tariqt@...dia.com, ayal@...dia.com, mkubecek@...e.cz,
Ido Schimmel <idosch@...dia.com>
Subject: Re: [RFC PATCH net-next 0/6] devlink: Add device metric support
On Sun, 23 Aug 2020 10:04:34 +0300 Ido Schimmel wrote:
> > You seem to focus on less relevant points. I primarily care about the
> > statistics being defined and identified by Linux, not every vendor for
> > themselves.
>
> Trying to understand how we can move this forward. The issue is with the
> specific VXLAN metrics, but you generally agree with the need for the
> framework? See my two other examples: Cache counters and algorithmic
> TCAM counters.
Yes, we will likely need a way to report design-specific performance
counters no matter what. That said I would prefer to pave the way for
exposing standardized stats first, so the reviewers (e.g. myself) have
a clear place to point folks to.
My last attempt was to just try to standardize the strings for the
per-netdev TLS offload stats (those are in addition to the /proc stats),
and document them in Documentation/. It turned out to have quite a
high review overhead, and the convergence is not satisfactory.
The only strong use I have right now is FEC stats, and I'm planning to
add IEEE-based counters to devlink ports. The scoping of MAC/PHY
counters to dl-port is, I hope, reasonable, although it remains to be
seen what phy folks think about it.
As I previously said - I think that protocol stats are best exported
from the protocol driver, otherwise the API may need to grow parallel
hierarchies. E.g. semantics of per-queue NIC counters get confusing
unless the are reported with the information about the queues - sadly
no API for that exists. In particular the life time of objects is hard
to match with lifetime of statistics. Similar thing with low
granularity counters related to traffic classification.
Long story short, it's a complicated topic, IDK how much of it I can
expect you to tackle. At the minimum I'd like it if we had a clear
separation between Linux/standard stats that drivers should share,
and justifiably implementation specific values.
The DEVLINK_..GENERIC identifiers or trying to standardize on strings
are not working for me as a reviewer, and as an infrastructure engineer.
Powered by blists - more mailing lists