lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 7 Feb 2024 21:03:35 -0800
From: Saeed Mahameed <saeed@...nel.org>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Arnd Bergmann <arnd@...db.de>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Leon Romanovsky <leonro@...dia.com>,
	Jason Gunthorpe <jgg@...dia.com>, Jiri Pirko <jiri@...dia.com>,
	Leonid Bloch <lbloch@...dia.com>, Itay Avraham <itayavr@...dia.com>,
	Saeed Mahameed <saeedm@...dia.com>,
	David Ahern <dsahern@...nel.org>,
	Aron Silverton <aron.silverton@...cle.com>,
	Christoph Hellwig <hch@...radead.org>,
	andrew.gospodarek@...adcom.com, linux-kernel@...r.kernel.org,
	netdev@...r.kernel.org
Subject: Re: [PATCH V4 0/5] mlx5 ConnectX control misc driver

On 07 Feb 07:03, Jakub Kicinski wrote:
>On Tue,  6 Feb 2024 23:24:30 -0800 Saeed Mahameed wrote:
>> From: Saeed Mahameed <saeedm@...dia.com>
>>
>> Recap from V3 discussion:
>> =========================
>>
>> LWN has published an article on this series aptly summarizing the debate.
>> LINK: https://lwn.net/Articles/955001/
>>
>> We continue to think that mlx5ctl is reasonable and aligned with the
>> greater kernel community values. People have pointed to the HW RAID
>> miscdevices as a good analog. The MD developers did not get to block HW
>> RAID configuration on the basis that it undermines their work on the
>> software RAID stack. Further, while there is a superficial similarity to
>> the DRM/accel debate, that was grounded in a real concern that DRM values
>> on open source would be bypassed. That argument does not hold up here as
>> this does come with open source userspace and the functionality mlx5ctl
>> enables on lockdown has always been available to ConnectX users through
>> the non-lockdown PCI sysfs. netdev has been doing just fine despite the
>> long standing presence of this tooling and we have continued to work with
>> Jakub on building common APIs when appropriate. mlx5 already implements
>> a wide range of the netdev common interfaces, many of which were pushed
>> forward by our staff - the DPLL configuration netlink being a recent
>> example.
>
>I appreciate Jiri's contributions (and you hired Maciej off of Intel at
>some point) but don't make it sound like nVidia lead DPLL, or pushed for
>a common interface :| Intel posted SyncE support. I asked them make it
>a standalone API family:
>

I will let the stats speak for itself.
$ git shortlog -sne --no-merges net/devlink 
and prior to commit f05bd8ebeb69 devlink: move code to a dedicated directory
$ git shortlog -sne --no-merges net/core/devlink.c

More than 70% of the commits are authored by more than 10 different individuals
form Mellanox/Nvidia .. 

Ok you don't like DPLL, here is a list of some central devlink features we did 
push to the devlink standard API:

  - subfunction API and devlink infrastructure
  - Shared buffer API
  - port function and rate API
  - shared buffer
  - health 

>https://lore.kernel.org/netdev/20210830162909.110753ec@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/
>
>Vadim from Meta joined in and helped a lot based on the OCP time card.
>Then after some delaying and weird noises y'all started participating.
>

I remember those discussions, and I agree it is very weird when it
takes 3 vendors and 2 years to get a simple devlink API for single bit
flip accepted.

>> mlx5 ConnectX control misc driver:
>> ==================================
>>
>> The ConnectX HW family supported by the mlx5 drivers uses an architecture
>> where a FW component executes "mailbox RPCs" issued by the driver to make
>> changes to the device. This results in a complex debugging environment
>> where the FW component has information and low level configuration that
>> needs to be accessed to userspace for debugging purposes.
>
>You don't explain anywhere why addressing the challenges of using
>debugfs in secure environments isn't the way to go.
>

I already answered this question in length in v3
here: https://lore.kernel.org/all/ZWZFm2qqhV1wKKCV@x130/

>Also you keep saying debugging purposes but the driver is called
>"control misc driver", you need to iron out your narrative just
>a bit more.
>
>> Historically a userspace program was used that accessed the PCI register
>> and config space directly through /sys/bus/pci/.../XXX and could operate
>> these debugging interfaces in parallel with the running driver.
>> This approach is incompatible with secure boot and kernel lockdown so this
>> driver provides a secure and restricted interface to that.
>
>[snip]
>
>>     i) mstreg
>>       The mlxreg utility allows users to obtain information regarding
>>       supported access registers, such as their fields
>
>So the access mstreg gives over this interface is read only? That's
>what your description sounds like, but given your complaints about
>"inability to add knobs" and "control" in the name of the driver that
>must be false.
>

Yes this is enforced by the mlx5ctl driver and FW using the special
debug uid.

>> Other usecases with umem:
>>   - dynamic HW and FW trace monitoring
>>   - high frequency diagnostic counters sampling
>
>Ah yes, the high frequency counters. Something that is definitely
>impossible to implement in a generic way. You were literally in the
>room at netconf when David Ahern described his proposal for this.
>

I was in the room and I am in support of David's idea, I like it a lot,
but I don't believe we have any concrete proposal, and we don't have any
use case for it in netdev for now, our use case for this is currently RDMA
and HPC specific.

Also siimilar to devlink we will be the first to jump in and implement
the new API once defined, but this doesn't mean I need to throw away the
whole driver just because one single use case will be implemented in netdev
one day, and I am sure the netdev implementation won't satisfy all the
use-cases of high frequency counters:

Also keep in mind high frequency counters is a very small part of the debug 
and access capabilities the mlx5ctl interface offers.

>Anyway, I don't want to waste any more time arguing with you.
>My opinion is that the kernel community is under no obligation to carry
>your proprietary gateway interfaces. I may be wrong, but I'm entitled
>to my opinion.
>

Thanks, I appreciate your honesty, but I must disagree with your Nack, we
provided enough argument for why we believe this approach is the right
way to go, it is clear from the responses on V3 and from the LWN article
that we have the community support for this open source project.

>Please do me the basic courtesy of carrying my nack on these patches:
>
>Nacked-by: Jakub Kicinski <kuba@...nel.org>
>
>and CC netdev, so I don't have to respond again :|

Ack.

Thanks,
Saeed.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ