[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZV0zXBmINtopBvLQ@x130>
Date: Tue, 21 Nov 2023 14:46:52 -0800
From: Saeed Mahameed <saeed@...nel.org>
To: David Ahern <dsahern@...nel.org>
Cc: Saeed Mahameed <saeedm@...dia.com>,
Jakub Kicinski <kuba@...nel.org>,
Arnd Bergmann <arnd@...db.de>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Jason Gunthorpe <jgg@...dia.com>,
Leon Romanovsky <leonro@...dia.com>,
Jiri Pirko <jiri@...dia.com>, Leonid Bloch <lbloch@...dia.com>,
Itay Avraham <itayavr@...dia.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH V3 5/5] misc: mlx5ctl: Add umem reg/unreg ioctl
On 21 Nov 14:18, David Ahern wrote:
>On 11/21/23 1:04 PM, Saeed Mahameed wrote:
>> On 21 Nov 12:44, Jakub Kicinski wrote:
>>> On Mon, 20 Nov 2023 23:06:19 -0800 Saeed Mahameed wrote:
>>>> high frequency diagnostic counters
>>>
>>> So is it a debug driver or not a debug driver?
>>>
>>
>> High frequency _diagnostic_ counters are a very useful tool for
>> debugging a high performance chip. So yes this is for diagnostics/debug.
>>
>>> Because I'm pretty sure some people want to have access to high freq
>>> counters in production, across their fleet. What's worse David Ahern
>>> has been pitching a way of exposing device counters which would be
>>> common across netdev.
>.
>
>For context on the `what's worse ...` comment for those who have not
>seen the netconf slides:
>https://netdev.bots.linux.dev/netconf/2023/david.pdf
>
>and I am having a hard time parsing Kuba's intent with that comment here
>(knowing you did not like the pitch I made at netconf :-))
>
>
>>
>> This is not netdev, this driver is to support ConnectX chips and SoCs
>> with any stack, netdev/rdma/vdpa/virtio and internal chip units and
>> acceleration engines, add to that ARM core diagnostics in case of
>> Blue-Field DPUs.
>> I am not looking for counting netdev ethernet packets in this driver.
>>
>> I am also pretty sure David will also want an interface to access other
>> than netdev counters, to get more visibility on how a specific chip is
>> behaving.
>
>yes, and h/w counters were part of the proposal. One thought is to
>leverage userspace registered memory with the device vs mapping bar
>space, but we have not moved beyond a theoretical discussion at this point.
>
>>
>>> Definite nack on this patch.
>>
>> Based on what ?
>
>It's a generic interface argument?
>
For this driver the diagnostic counters is only a small part of the debug
utilities the driver provides, so it is not fair to nak this patch based
on one use-case, we need this driver to also dump other stuff like
core dumps, FW contexts, internal objects, register dumps, resource dumps,
etc ..
This patch original purpose was to allow core dumps, since core dump can go
up to 2MB of memory, without this patch we won't have core dump ability
which is more important for debugging than diagnostic counters.
You can find more here:
https://github.com/saeedtx/mlx5ctl#mlx5ctl-userspace-linux-debug-utilities-for-mlx5-connectx-devices
For diagnostic counters we can continue the discussion to have a generic
interface I am all for it, but it's irrelevant for this submission.
Powered by blists - more mailing lists