[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200918121621.GQ8409@ziepe.ca>
Date: Fri, 18 Sep 2020 09:16:21 -0300
From: Jason Gunthorpe <jgg@...pe.ca>
To: Oded Gabbay <oded.gabbay@...il.com>
Cc: Gal Pressman <galpress@...zon.com>,
Jakub Kicinski <kuba@...nel.org>,
"Linux-Kernel@...r. Kernel. Org" <linux-kernel@...r.kernel.org>,
netdev@...r.kernel.org, SW_Drivers <SW_Drivers@...ana.ai>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
"David S. Miller" <davem@...emloft.net>,
Andrew Lunn <andrew@...n.ch>,
Florian Fainelli <f.fainelli@...il.com>,
linux-rdma@...r.kernel.org
Subject: Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
On Fri, Sep 18, 2020 at 02:59:28PM +0300, Oded Gabbay wrote:
> On Fri, Sep 18, 2020 at 2:56 PM Jason Gunthorpe <jgg@...pe.ca> wrote:
> >
> > On Fri, Sep 18, 2020 at 02:36:10PM +0300, Gal Pressman wrote:
> > > On 17/09/2020 20:18, Jason Gunthorpe wrote:
> > > > On Tue, Sep 15, 2020 at 11:46:58PM +0300, Oded Gabbay wrote:
> > > >> infrastructure for communication between multiple accelerators. Same
> > > >> as Nvidia uses NVlink, we use RDMA that we have inside our ASIC.
> > > >> The RDMA implementation we did does NOT support some basic RDMA
> > > >> IBverbs (such as MR and PD) and therefore, we can't use the rdma-core
> > > >> library or to connect to the rdma infrastructure in the kernel.
> > > >
> > > > You can't create a parallel RDMA subsystem in netdev, or in misc, and
> > > > you can't add random device offloads as IOCTL to nedevs.
> > > >
> > > > RDMA is the proper home for all the networking offloads that don't fit
> > > > into netdev.
> > > >
> > > > EFA was able to fit into rdma-core/etc and it isn't even RoCE at
> > > > all. I'm sure this can too.
> > >
> > > Well, EFA wasn't welcomed to the RDMA subsystem with open arms ;), initially it
> > > was suggested to go through the vfio subsystem instead.
> > >
> > > I think this comes back to the discussion we had when EFA was upstreamed, which
> > > is what's the bar to get accepted to the RDMA subsystem.
> > > IIRC, what we eventually agreed on is having a userspace rdma-core provider and
> > > ibv_{ud,rc}_pingpong working (or just supporting one of the IB spec's QP types?).
> >
> > That is more or less where we ended up, yes.
> >
> > I'm most worried about this lack of PD and MR.
> >
> > Kernel must provide security for apps doing user DMA, PD and MR do
> > this. If the device doesn't have PD/MR then it is hard to see how a WQ
> > could ever be exposed directly to userspace, regardless of subsystem.
>
> Hi Jason,
> What you say here is very true and we handle that with different
> mechanisms. I will start working on a dedicated patch-set of the RDMA
> code in the next few weeks with MUCH MORE details in the commit
> messages. That will explain exactly how we expose stuff and protect.
>
> For example, regarding isolating between applications, we only support
> a single application opening our file descriptor.
Then the driver has a special PD create that requires the misc file
descriptor to authorize RDMA access to the resources in that security
context.
> Another example is that the submission of WQ is done through our QMAN
> mechanism and is NOT mapped to userspace (due to the restrictions you
> mentioned above and other restrictions).
Sure, other RDMA drivers also require a kernel ioctl for command
execution.
In this model the MR can be a software construct, again representing a
security authorization:
- A 'full process' MR, in which case the kernel command excution
handles dma map and pinning at command execution time
- A 'normal' MR, in which case the DMA list is pre-created and the
command execution just re-uses this data
The general requirement for RDMA is the same as DRM, you must provide
enough code in rdma-core to show how the device works, and minimally
test it. EFA uses ibv_ud_pingpong, and some pyverbs tests IIRC.
So you'll want to arrange something where the default MR and PD
mechanisms do something workable on this device, like auto-open the
misc FD when building the PD, and support the 'normal' MR flow for
command execution.
Jason
Powered by blists - more mailing lists