lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZV0zXBmINtopBvLQ@x130>
Date:   Tue, 21 Nov 2023 14:46:52 -0800
From:   Saeed Mahameed <saeed@...nel.org>
To:     David Ahern <dsahern@...nel.org>
Cc:     Saeed Mahameed <saeedm@...dia.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Arnd Bergmann <arnd@...db.de>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Jason Gunthorpe <jgg@...dia.com>,
        Leon Romanovsky <leonro@...dia.com>,
        Jiri Pirko <jiri@...dia.com>, Leonid Bloch <lbloch@...dia.com>,
        Itay Avraham <itayavr@...dia.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH V3 5/5] misc: mlx5ctl: Add umem reg/unreg ioctl

On 21 Nov 14:18, David Ahern wrote:
>On 11/21/23 1:04 PM, Saeed Mahameed wrote:
>> On 21 Nov 12:44, Jakub Kicinski wrote:
>>> On Mon, 20 Nov 2023 23:06:19 -0800 Saeed Mahameed wrote:
>>>> high frequency diagnostic counters
>>>
>>> So is it a debug driver or not a debug driver?
>>>
>>
>> High frequency _diagnostic_ counters are a very useful tool for
>> debugging a high performance chip. So yes this is for diagnostics/debug.
>>
>>> Because I'm pretty sure some people want to have access to high freq
>>> counters in production, across their fleet. What's worse David Ahern
>>> has been pitching a way of exposing device counters which would be
>>> common across netdev.
>.
>
>For context on the `what's worse ...` comment for those who have not
>seen the netconf slides:
>https://netdev.bots.linux.dev/netconf/2023/david.pdf
>
>and I am having a hard time parsing Kuba's intent with that comment here
>(knowing you did not like the pitch I made at netconf :-))
>
>
>>
>> This is not netdev, this driver is to support ConnectX chips and SoCs
>> with any stack, netdev/rdma/vdpa/virtio and internal chip units and
>> acceleration engines, add to that ARM core diagnostics in case of
>> Blue-Field DPUs.
>> I am not looking for counting netdev ethernet packets in this driver.
>>
>> I am also pretty sure David will also want an interface to access other
>> than netdev counters, to get more visibility on how a specific chip is
>> behaving.
>
>yes, and h/w counters were part of the proposal. One thought is to
>leverage userspace registered memory with the device vs mapping bar
>space, but we have not moved beyond a theoretical discussion at this point.
>
>>
>>> Definite nack on this patch.
>>
>> Based on what ?
>
>It's a generic interface argument?
>

For this driver the diagnostic counters is only a small part of the debug
utilities the driver provides, so it is not fair to nak this patch based
on one use-case, we need this driver to also dump other stuff like
core dumps, FW contexts, internal objects, register dumps, resource dumps,
etc ..

This patch original purpose was to allow core dumps, since core dump can go
up to 2MB of memory, without this patch we won't have core dump ability
which is more important for debugging than diagnostic counters.

You can find more here:
https://github.com/saeedtx/mlx5ctl#mlx5ctl-userspace-linux-debug-utilities-for-mlx5-connectx-devices

For diagnostic counters we can continue the discussion to have a generic
interface I am all for it, but it's irrelevant for this submission.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ