lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231121234630.GJ6083@nvidia.com>
Date:   Tue, 21 Nov 2023 19:46:30 -0400
From:   Jason Gunthorpe <jgg@...dia.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     Saeed Mahameed <saeed@...nel.org>, Arnd Bergmann <arnd@...db.de>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Leon Romanovsky <leonro@...dia.com>,
        Jiri Pirko <jiri@...dia.com>, Leonid Bloch <lbloch@...dia.com>,
        Itay Avraham <itayavr@...dia.com>,
        linux-kernel@...r.kernel.org, Saeed Mahameed <saeedm@...dia.com>
Subject: Re: [PATCH V3 5/5] misc: mlx5ctl: Add umem reg/unreg ioctl

On Tue, Nov 21, 2023 at 12:44:56PM -0800, Jakub Kicinski wrote:
> On Mon, 20 Nov 2023 23:06:19 -0800 Saeed Mahameed wrote:
> > high frequency diagnostic counters
> 
> So is it a debug driver or not a debug driver?

In the part you decided not to quote Saeed explained how the main
purpose of the generic DMA to userspace mechanism is to transfer FW
trace, FW memory copies and other large data dumps.

The thing with generic stuff is you can use it for lots of things if
you are so inclined. Saeed gave many examples. I think you took it in
the wrong way as I am not aware of any plan for actual high speed
netdev relavent counters in a performance monitor application. It
isn't that kind of "high speed".

The main interest is for micro-architectural debugging
information. The kind that are opaque unless you can reference the RTL
to understand what it means. It is "high speed" in the sense that
executing a FW command per register/counter would be offensively slow
compared to executing a FW command to bulk DMA a cluster of
micro-architecture registers/etc in the device.

The design is so generic because it is a debug interface that we want
to be always available and always fully functional. Mellanox ships new
FW and new chips at a rapid rate, we do not want to be changing the
kernel driver every time we do anything. That will never get
backported into production kernels across all our supported customers
fast enough. Debug features that a field support engineer cannot
access simply do not exist.

Debugs are challenging. mlx5 is the most popular datacenter NIC in the
world. We have so many insane problems, you wouldn't belive it. I just
spent 8 months leading a debug that turned out to be a qemu defect
(thanks Paolo for all the help!!). This debug data and flexibility is
critical to making these hugely complex systems work.

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ