[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200921115239.GC8409@ziepe.ca>
Date: Mon, 21 Sep 2020 08:52:39 -0300
From: Jason Gunthorpe <jgg@...pe.ca>
To: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc: Oded Gabbay <oded.gabbay@...il.com>,
Leon Romanovsky <leon@...nel.org>,
Gal Pressman <galpress@...zon.com>,
Jakub Kicinski <kuba@...nel.org>,
"Linux-Kernel@...r. Kernel. Org" <linux-kernel@...r.kernel.org>,
netdev@...r.kernel.org, SW_Drivers <SW_Drivers@...ana.ai>,
"David S. Miller" <davem@...emloft.net>,
Andrew Lunn <andrew@...n.ch>,
Florian Fainelli <f.fainelli@...il.com>,
linux-rdma@...r.kernel.org
Subject: Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
On Sun, Sep 20, 2020 at 10:47:02AM +0200, Greg Kroah-Hartman wrote:
> > If not, what open source userspace are you going to ask them to
> > present to merge the kernel side into misc?
>
> I don't think that they have a userspace api to their rdma feature from
> what I understand, but I could be totally wrong as I do not know their
> hardware at all, so I'll let them answer this question.
I thought Oded was pretty clear, the goal of this series is to expose
their RDMA HW to userspace. This problem space requires co-mingling
networking and compute at extremely high speed/low overhead. This is
all done in userspace.
We are specifically talking about this in
include/uapi/misc/habanalabs.h:
/*
* NIC
*
* This IOCTL allows the user to manage and configure the device's NIC ports.
* The following operations are available:
* - Create a completion queue
* - Destroy a completion queue
* - Wait on completion queue
* - Poll a completion queue
* - Update consumed completion queue entries
* - Set a work queue
* - Unset a work queue
*
* For all operations, the user should provide a pointer to an input structure
* with the context parameters. Some of the operations also require a pointer to
* driver regarding how many of the available CQEs were actually
* processed/consumed. Only then the driver will override them with newer
* entries.
* The set WQ operation should provide the device virtual address of the WQ with
* a matching size for the number of WQs and entries per WQ.
*
*/
#define HL_IOCTL_NIC _IOWR('H', 0x07, struct hl_nic_args)
Which is ibv_create_qp, ibv_create_cq, ibv_poll_cq, etc, etc
Habana has repeatedly described their HW as having multiple 100G RoCE
ports. RoCE is one of the common industry standards that ibverbs
unambiguously is responsible for.
I would be much less annoyed if they were not actively marketing their
product as RoCE RDMA.
Sure there is some argument that their RoCE isn't spec compliant, but
I don't think it excuses the basic principle of our subsystem:
RDMA HW needs to demonstrate some basic functionality using the
standard open source userspace software stack.
I don't like this idea of backdooring a bunch of proprietary closed
source RDMA userspace through drivers/misc, and if you don't have a
clear idea how to get something equal for drivers/misc you should not
accept the H_IOCTL_NIC.
Plus RoCE is complicated, there is a bunch of interaction with netdev
and rules related to that that really needs to be respected.
> For anything that _has_ to have a userspace RMDA interface, sure ibverbs
> are the one we are stuck with, but I didn't think that was the issue
> here at all, which is why I wrote the above comments.
I think you should look at the patches #8 through 11:
https://lore.kernel.org/lkml/20200915171022.10561-9-oded.gabbay@gmail.com/
Jason
Powered by blists - more mailing lists