[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ea4b0892-a087-4931-bc3a-319255d85038@intel.com>
Date: Wed, 28 Aug 2024 17:06:17 +0200
From: Alexander Lobakin <aleksander.lobakin@...el.com>
To: Jakub Kicinski <kuba@...nel.org>
CC: Tony Nguyen <anthony.l.nguyen@...el.com>, <davem@...emloft.net>,
<pabeni@...hat.com>, <edumazet@...gle.com>, <netdev@...r.kernel.org>,
<przemyslaw.kitszel@...el.com>, <joshua.a.hay@...el.com>,
<michal.kubiak@...el.com>, <nex.sw.ncis.osdt.itp.upstreaming@...el.com>
Subject: Re: [PATCH net-next v2 2/9] libeth: add common queue stats
From: Jakub Kicinski <kuba@...nel.org>
Date: Tue, 27 Aug 2024 11:29:33 -0700
> On Tue, 27 Aug 2024 17:31:55 +0200 Alexander Lobakin wrote:
>>>> But it's up to the vendors right, I can't force them to use this code or
>>>> just switch their driver to use it :D
>>>
>>> It shouldn't be up to interpretation whether the library makes code
>>> better. If it doesn't -- what's the point of the library. If it does
>>> -- the other vendors better have a solid reason to push back.
>>
>> Potential reasons to push back (by "we" I mean some vendor X):
>>
>> * we want our names for Ethtool stats, not yours ("txq_X" instead of
>> "sqX" and so on), we have scripts which already parse our names blah
>
> If the stat is standardized in a library why is it dumped via ethtool -S
So do you mean that I need to make these stats generic Netlink instead
of adding Ethtool stats?
[...]
>> FYI I already nack inside Intel any new drivers since I was promised
>> that each next generation will be based on top of idpf.
>
> I don't mind new drivers, how they get written is a bigger problem,
> but that's a different discussion.
>
>>>> I'll be happy to remove that for basic Rx/Tx queues (and leave only
>>>> those which don't exist yet in the NL stats) and when you introduce
>>>> more fields to NL stats, removing more from ethtool -S in this
>>>> library will be easy.
>>>
>>> I don't scale to remembering 1 easy thing for each driver we have.
>>
>> Introducing a new field is adding 1 line with its name to the macro
>> since everything else gets expanded from these macros anyway.
>
> Are you saying that people on your team struggle to add a statistic?
For sure no. I just let the preprocessor do mechanical things instead of
manual retyping stuff.
> #1 correctness, #2 readability. LoC only matters indirectly under #2.
#1 -- manual retyping is more error-prone?
#2 -- gcc -E ?
Anyway, just side notes, I get what you're saying and partially agree.
>
>>>> But let's say what should we do with XDP Tx
>>>> queues? They're invisible to rtnl as they are past real_num_tx_queues.
>>>
>>> They go to ethtool -S today. It should be relatively easy to start
>>> reporting them. I didn't add them because I don't have a clear use
>>> case at the moment.
>>
>> The same as for regular Tx: debugging, imbalance etc.
>
> I'm not trying to imply they are useless. I just that I, subjectively,
> don't have a clear use case in Meta's production env.
>
>>> save space on all the stats I'm not implementing.
>>
>> The stats I introduced here are supported by most, if not every, modern
>> NIC drivers. Not supporting header split or HW GRO will save you 16
>> bytes on the queue struct which I don't think is a game changer.
>
> You don't understand. I built some infra over the last 3 years.
> You didn't bother reading it. Now you pop our own thing to the side,
> extending ethtool -S which is _unusable_ in a multi-vendor, production
> environment.
I read everything at the time you introduced it. I remember Ethernet
standard stats, rmon, per-queue generic stats etc. I respect it and I
like it.
So just let me repeat my question so that all misunderstandings are
gone: did I get it correctly that instead of adding Ethtool stats, I
need to add new fields to the per-queue Netlink stats? I clearly don't
have any issues with that and I'll be happy to drop Ethtool stats from
the lib at all.
(except XDP stats, they still go to ethtool -S for now? Or should I try
making them generic as well?)
>
>>>> * implementing NL stats in drivers, not here; not exporting NL stats
>>>> to ethtool -S
>>>>
>>>> A driver wants to export a field which is missing in the lib? It's a
>>>> couple lines to add it. Another driver doesn't support this field and
>>>> you want it to still be 0xff there? Already noted and I'm already
>>>> implementing a different model.
>>>
>>> I think it will be very useful if you could step back and put on paper
>>> what your goals are with this work, exactly.
>>
>> My goals:
>>
>> * reduce boilerplate code in drivers: declaring stats structures,
>> Ethtool stats names, all these collecting, aggregating etc etc, you see
>> in the last commit of the series how many LoCs get deleted from idpf,
>> +/- the same amount would be removed from any other driver
>
> 21 files changed, 1634 insertions(+), 1002 deletions(-)
Did you notice my "in the last commit"?
8 changed files with 232 additions and 691 deletions.
>
>> * reduce the time people debug and fix bugs in stats since it will be
>> just in one place, not in each driver
>
> Examples ?
Eeeh I remember there were commits as between the drivers, the logic to
count some stats was inconsistent? But I don't have any links, so this
is not an argument. But I also fixed bugs a couple time which were the
same in several Intel drivers, that can always happen.
>
>> * have more consistent names in ethtool -S
>
> Forget it, better infra exists. Your hack to make stat count
> "consistent" doesn't work either. Unless you assume only one process
> can call ethtool -S at a time.
>
>> * have more consistent stats sets in drivers since even within Intel
>> drivers it's a bit of a mess which stats are exported etc.
>
> Consistently undocumented :( qstats exist and are documented.
>
>> Most of your pushback here sounds like if I would try to introduce this
>> in the core code, but I don't do that here. This infra saves a lot of
>> locs
>
> s/saves/may save/ ?
Series for more drivers are in progress, they really remove way more
than add.
>
>> and time when used in the Intel drivers and it would be totally
>> fine for me if some pieces of the lib goes into the core, but these
>> stats don't.
>> I didn't declare anywhere that everyone must use it or that it's core
>> code, do you want me to change this MODULE_DESCRIPTION()?
>
> I deeply dislike the macro templating. Complex local systems of this
> sort make it really painful to do cross-driver changes. It's fine for
> you because you wrote this, but speaking from experience (mlx5) it makes
> modifying the driver for an outside much harder.
>
> If you think the core infra is lacking, please improve it instead of
> inventing your own, buggy one.
Don't get me wrong, I did it via Ethtool not because I don't like your
infra or so, strongly the opposite. Maybe it was just something like "I
did it thousand times so I automatically did this that way for the
1001th time", or just because I wanted to deduplicate the Intel code and
there, all the stats are in the Ethtool only, dunno =\
Thanks,
Olek
Powered by blists - more mailing lists