[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200821103021.GA331448@shredder>
Date: Fri, 21 Aug 2020 13:30:21 +0300
From: Ido Schimmel <idosch@...sch.org>
To: Jakub Kicinski <kuba@...nel.org>
Cc: David Ahern <dsahern@...il.com>,
Florian Fainelli <f.fainelli@...il.com>,
netdev@...r.kernel.org, davem@...emloft.net, jiri@...dia.com,
amcohen@...dia.com, danieller@...dia.com, mlxsw@...dia.com,
roopa@...dia.com, andrew@...n.ch, vivien.didelot@...il.com,
tariqt@...dia.com, ayal@...dia.com, mkubecek@...e.cz,
Ido Schimmel <idosch@...dia.com>
Subject: Re: [RFC PATCH net-next 0/6] devlink: Add device metric support
On Thu, Aug 20, 2020 at 09:09:42AM -0700, Jakub Kicinski wrote:
> On Thu, 20 Aug 2020 08:35:25 -0600 David Ahern wrote:
> > On 8/19/20 12:07 PM, Jakub Kicinski wrote:
> > > I don't have a great way forward in mind, sadly. All I can think of is
> > > that we should try to create more well defined interfaces and steer
> > > away from free-form ones.
> >
> > There is a lot of value in free-form too.
>
> On Tue, 18 Aug 2020 20:35:01 -0700 Jakub Kicinski wrote:
> > It's a question of interface, not the value of exposed data.
>
> > > Example, here if the stats are vxlan decap/encap/error - we should
> > > expose that from the vxlan module. That way vxlan module defines one
> > > set of stats for everyone.
> > >
> > > In general unless we attach stats to the object they relate to, we will
> > > end up building parallel structures for exposing statistics from the
> > > drivers. I posted a set once which was implementing hierarchical stats,
> > > but I've abandoned it for this reason.
> > > > [...]
> > >
> > > IDK. I just don't feel like this is going to fly, see how many names
> > > people invented for the CRC error statistic in ethtool -S, even tho
> > > there is a standard stat for that! And users are actually parsing the
> > > output of ethtool -S to get CRC stats because (a) it became the go-to
> > > place for NIC stats and (b) some drivers forget to report in the
> > > standard place.
> > >
> > > The cover letter says this set replaces the bad debugfs with a good,
> > > standard API. It may look good and standard for _vendors_ because they
> > > will know where to dump their counters, but it makes very little
> > > difference for _users_. If I have to parse names for every vendor I use,
> > > I can as well add a per-vendor debugfs path to my script.
> > >
> > > The bar for implementation-specific driver stats has to be high.
> >
> > My take away from this is you do not like the names - the strings side
> > of it.
> >
> > Do you object to the netlink API? The netlink API via devlink?
> >
> > 'perf' has json files to describe and document counters
> > (tools/perf/pmu-events). Would something like that be acceptable as a
> > form of in-tree documentation of counters? (vs Documentation/networking
> > or URLs like
> > https://community.mellanox.com/s/article/understanding-mlx5-ethtool-counters)
>
> Please refer to what I said twice now about the definition of the stats
> exposed here belonging with the VxLAN code, not the driver.
Please refer to the changelog:
"The Spectrum ASICs have a single hardware VTEP that is able to perform
VXLAN encapsulation and decapsulation. The VTEP is logically mapped by
mlxsw to the multiple VXLAN netdevs that are using it. Exposing the
counters of this VTEP via the multiple VXLAN netdevs that are using it
would be both inaccurate and confusing for users.
Instead, expose the counters of the VTEP via devlink-metric. Note that
Spectrum-1 supports a different set of counters compared to newer ASICs
in the Spectrum family."
Hardware implementations will rarely fit 1:1 to the nice and discrete
software implementations that they try to accelerate. The purpose of
this API is exposing metrics specific to these hardware implementations.
This results in better visibility which can be leveraged for faster
debugging and more thorough testing.
The reason I came up with this interface is not the specific VXLAN
metrics that bother you, but a new platform we are working on. It uses
the ASIC as a cache that refers lookups to an external device in case of
cache misses. It is completely transparent to user space (you get better
scale), but the driver is very much aware of this stuff as it needs to
insert objects (e.g., routes) in a way that will minimize cache misses.
Just checking that ping works is hardly enough. We must be able to read
the cache counters to ensure we do not see cache misses when we do not
expect them.
As another example, consider the algorithmic TCAM implementation we have
in Spectrum-2 for ACLs [1]. While a user simply adds / deletes filters,
the driver needs to jump through multiple hops in order to program them
in a way that will result in a better scale and reduced latency. We
currently do not have an interface through which we can expose metrics
related to this specific implementation.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=756cd36626f773e9a72a39c1dd12da4deacfacdf
>
> > > Okay, fair. I just think that in datacenter deployments we are way
> > > closer to the SDK model than people may want to admit.
> >
> > I do not agree with that; the SDK model means you *must* use vendor code
> > to make something work. Your argument here is about labels for stats and
> > an understanding of their meaning.
>
> Sure, no "must" for passing packets, but you "must" use vendor tooling
> to operate a fleet.
>
> Since everybody already has vendor tools what value does this API add?
We don't have any "vendor tools" to get this information. Our team is
doing everything it possibly can in order to move away from such an
approach.
> I still need per vendor logic. Let's try to build APIs which will
> actually make user's life easier, which users will want to switch to.
Developers are also users and they should be able to read whatever
information they need from the device in order to help them do their
work. You have a multitude of tools (e.g., kprobes, tracepoints) to get
better visibility into the software data path. Commonality is not a
reason to be blind as a bat when looking into the hardware data path.
Powered by blists - more mailing lists