[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200918132645.GS8409@ziepe.ca>
Date: Fri, 18 Sep 2020 10:26:45 -0300
From: Jason Gunthorpe <jgg@...pe.ca>
To: Oded Gabbay <oded.gabbay@...il.com>
Cc: izur@...ana.ai, Gal Pressman <galpress@...zon.com>,
Jakub Kicinski <kuba@...nel.org>,
"Linux-Kernel@...r. Kernel. Org" <linux-kernel@...r.kernel.org>,
netdev@...r.kernel.org, SW_Drivers <SW_Drivers@...ana.ai>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
"David S. Miller" <davem@...emloft.net>,
Andrew Lunn <andrew@...n.ch>,
Florian Fainelli <f.fainelli@...il.com>,
linux-rdma@...r.kernel.org
Subject: Re: [PATCH v3 00/14] Adding GAUDI NIC code to habanalabs driver
On Fri, Sep 18, 2020 at 04:02:24PM +0300, Oded Gabbay wrote:
> The problem with MR is that the API doesn't let us return a new VA. It
> forces us to use the original VA that the Host OS allocated.
If using the common MR API you'd have to assign a unique linear range
in the single device address map and record both the IOVA and the MMU
VA in the kernel struct.
Then when submitting work using that MR lkey the kernel will adjust
the work VA using the equation (WORK_VA - IOVA) + MMU_VA before
forwarding to HW.
EFA doesn't support rkeys, so they are not required to be emulated. It
would have to create rkeys using some guadidv_reg_mr_rkey()
It is important to understand that the usual way we support these
non-RDMA devices is to insist that they use SW to construct a minimal
standards based RDMA API, and then allow the device to have a 'dv' API
to access a faster, highly device specific, SW bypass path.
So for instance you might have some guadidv_post_work(qp) that doesn't
use lkeys and works directly on the MMU_VA. A guadidv_get_mmu_va(mr)
would return the required HW VA from the kernel.
Usually the higher level communication library (UCX, MPI, etc) forms
the dv primitives into something application usable.
> we do if that VA is in the range of our HBM addresses ? The device
> won't be able to distinguish between them. The transaction that is
> generated by an engine inside our device will go to the HBM instead of
> going to the PCI controller and then to the host.
>
> That's the crust of the problem and why we didn't use MR.
No, the problem with the device is that it doesn't have a lkey/rkey,
so it is stuck with a single translation domain. RoCE compliant
devices are required to have multiple translation domains - each
lkey/rkey specifies a unique translation.
The MR concept is a region of process VA mapped into the device for
device access, and this device *clearly* has that.
Jason
Powered by blists - more mailing lists