lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f07700d5-211f-d091-2b0b-fbaf03c4a959@amd.com>
Date:   Fri, 6 Jan 2017 11:56:30 -0500
From:   Serguei Sagalovitch <serguei.sagalovitch@....com>
To:     Jerome Glisse <j.glisse@...il.com>,
        Jason Gunthorpe <jgunthorpe@...idianresearch.com>
CC:     Jerome Glisse <jglisse@...hat.com>,
        "Deucher, Alexander" <Alexander.Deucher@....com>,
        "'linux-kernel@...r.kernel.org'" <linux-kernel@...r.kernel.org>,
        "'linux-rdma@...r.kernel.org'" <linux-rdma@...r.kernel.org>,
        "'linux-nvdimm@...ts.01.org'" <linux-nvdimm@...1.01.org>,
        "'Linux-media@...r.kernel.org'" <Linux-media@...r.kernel.org>,
        "'dri-devel@...ts.freedesktop.org'" <dri-devel@...ts.freedesktop.org>,
        "'linux-pci@...r.kernel.org'" <linux-pci@...r.kernel.org>,
        "Kuehling, Felix" <Felix.Kuehling@....com>,
        "Blinzer, Paul" <Paul.Blinzer@....com>,
        "Koenig, Christian" <Christian.Koenig@....com>,
        "Suthikulpanit, Suravee" <Suravee.Suthikulpanit@....com>,
        "Sander, Ben" <ben.sander@....com>, <hch@...radead.org>,
        <david1.zhou@....com>, <qiang.yu@....com>
Subject: Re: Enabling peer to peer device transactions for PCIe devices

On 2017-01-05 08:58 PM, Jerome Glisse wrote:
> On Thu, Jan 05, 2017 at 05:30:34PM -0700, Jason Gunthorpe wrote:
>> On Thu, Jan 05, 2017 at 06:23:52PM -0500, Jerome Glisse wrote:
>>
>>>> I still don't understand what you driving at - you've said in both
>>>> cases a user VMA exists.
>>> In the former case no, there is no VMA directly but if you want one than
>>> a device can provide one. But such VMA is useless as CPU access is not
>>> expected.
>> I disagree it is useless, the VMA is going to be necessary to support
>> upcoming things like CAPI, you need it to support O_DIRECT from the
>> filesystem, DPDK, etc. This is why I am opposed to any model that is
>> not VMA based for setting up RDMA - that is shorted sighted and does
>> not seem to reflect where the industry is going.
>>
>> So focus on having VMA backed by actual physical memory that covers
>> your GPU objects and ask how do we wire up the '__user *' to the DMA
>> API in the best way so the DMA API still has enough information to
>> setup IOMMUs and whatnot.
> I am talking about 2 different thing. Existing hardware and API where you
> _do not_ have a vma and you do not need one. This is just existing stuff.
I do not understand why you assume that existing API doesn't  need one.
I would say that a lot of __existing__ user level API and their support 
in kernel
(especially outside of graphics domain) assumes that we have vma and
deal with __user * pointers.
> Some close driver provide a functionality on top of this design. Question
> is do we want to do the same ? If yes and you insist on having a vma we
> could provide one but this is does not apply and is useless for where we
> are going with new hardware.
>
> With new hardware you just use malloc or mmap to allocate memory and then
> you use it directly with the device. Device driver can migrate any part of
> the process address space to device memory. In this scheme you have your
> usual VMAs but there is nothing special about them.
Assuming that the whole device memory is CPU accessible and it looks
like the direction where we are going:
- You forgot about use case when we want or need to allocate memory
directly on device (why we need to migrate anything if not needed?).
- We may want to use CPU to access such memory on device to avoid
any unnecessary migration back.
- We may have more device memory than the system one.
E.g. if you have 12 GPUs w/64GB each it will already give us ~0.7 TB
not mentioning NVDIMM cards which could also be used as memory
storage for other device access.
- We also may want/need to share GPU memory between different
processes.
> Now when you try to do get_user_page() on any page that is inside the
> device it will fails because we do not allow any device memory to be pin.
> There is various reasons for that and they are not going away in any hw
> in the planing (so for next few years).
>
> Still we do want to support peer to peer mapping. Plan is to only do so
> with ODP capable hardware. Still we need to solve the IOMMU issue and
> it needs special handling inside the RDMA device. The way it works is
> that RDMA ask for a GPU page, GPU check if it has place inside its PCI
> bar to map this page for the device, this can fail. If it succeed then
> you need the IOMMU to let the RDMA device access the GPU PCI bar.
>
> So here we have 2 orthogonal problem. First one is how to make 2 drivers
> talks to each other to setup mapping to allow peer to peer But I would assume  and second is
> about IOMMU.
>
I think that there is the third problem:  A lot of existing user level API
(MPI, IB Verbs, file i/o, etc.) deal with pointers to the buffers.
Potentially it would be ideally to support use cases when those buffers are
located in device memory avoiding any unnecessary migration / 
double-buffering.
Currently a lot of infrastructure in kernel assumes that this is the user
pointer and call "get_user_pages"  to get s/g.   What is your opinion
how it should be changed to deal with cases when "buffer" is in
device memory?



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ