[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161125161627.GA20703@redhat.com>
Date: Fri, 25 Nov 2016 11:16:28 -0500
From: Jerome Glisse <jglisse@...hat.com>
To: Haggai Eran <haggaie@...lanox.com>
Cc: akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, John Hubbard <jhubbard@...dia.com>,
Feras Daoud <ferasda@...lanox.com>,
Ilya Lesokhin <ilyal@...lanox.com>,
Liran Liss <liranl@...lanox.com>
Subject: Re: [HMM v13 00/18] HMM (Heterogeneous Memory Management) v13
On Wed, Nov 23, 2016 at 11:16:04AM +0200, Haggai Eran wrote:
> On 11/18/2016 8:18 PM, Jérôme Glisse wrote:
> > Cliff note: HMM offers 2 things (each standing on its own). First
> > it allows to use device memory transparently inside any process
> > without any modifications to process program code. Second it allows
> > to mirror process address space on a device.
> >
> > Change since v12 is the use of struct page for device memory even if
> > the device memory is not accessible by the CPU (because of limitation
> > impose by the bus between the CPU and the device).
> >
> > Using struct page means that their are minimal changes to core mm
> > code. HMM build on top of ZONE_DEVICE to provide struct page, it
> > adds new features to ZONE_DEVICE. The first 7 patches implement
> > those changes.
> >
> > Rest of patchset is divided into 3 features that can each be use
> > independently from one another. First is the process address space
> > mirroring (patch 9 to 13), this allow to snapshot CPU page table
> > and to keep the device page table synchronize with the CPU one.
> >
> > Second is a new memory migration helper which allow migration of
> > a range of virtual address of a process. This memory migration
> > also allow device to use their own DMA engine to perform the copy
> > between the source memory and destination memory. This can be
> > usefull even outside HMM context in many usecase.
> >
> > Third part of the patchset (patch 17-18) is a set of helper to
> > register a ZONE_DEVICE node and manage it. It is meant as a
> > convenient helper so that device drivers do not each have to
> > reimplement over and over the same boiler plate code.
> >
> >
> > I am hoping that this can now be consider for inclusion upstream.
> > Bottom line is that without HMM we can not support some of the new
> > hardware features on x86 PCIE. I do believe we need some solution
> > to support those features or we won't be able to use such hardware
> > in standard like C++17, OpenCL 3.0 and others.
> >
> > I have been working with NVidia to bring up this feature on their
> > Pascal GPU. There are real hardware that you can buy today that
> > could benefit from HMM. We also intend to leverage this inside the
> > open source nouveau driver.
>
>
> Hi,
>
> I think the way this new version of the patchset uses ZONE_DEVICE looks
> promising and makes the patchset a little simpler than the previous
> versions.
>
> The mirroring code seems like it could be used to simplify the on-demand
> paging code in the mlx5 driver and the RDMA subsystem. It currently uses
> mmu notifiers directly.
>
Yes i plan to spawn a patchset to show how to use HMM to replace some of
the ODP code. I am waiting for patchset to go upstream first before doing
that.
> I'm also curious whether it can be used to allow peer to peer access
> between devices. For instance, if one device calls hmm_vma_get_pfns on a
> process that has unaddressable memory mapped in, with some additional
> help from DMA-API, its driver can convert these pfns to bus addresses
> directed to another device's MMIO region and thus enable peer to peer
> access. Then by handling invalidations through HMM's mirroring callbacks
> it can safely handle cases where the peer migrates the page back to the
> CPU or frees it.
Yes this is something i have work on with NVidia, idea is that you will
see the hmm_pfn_t with the device flag set you can then retrive the struct
device from it. Issue is now to figure out how from that you can know that
this is a device with which you can interact. I would like a common and
device agnostic solution but i think as first step you will need to rely
on some back channel communication.
Once you have setup a peer mapping to the GPU memory its lifetime will be
tie with CPU page table content ie if the CPU page table is updated either
to remove the page (because of munmap/truncate ...) or because the page
is migrated to some other place. In both case the device using the peer
mapping must stop using it and refault to update its page table with the
new page where the data is.
Issue to implement the above lie in the order in which mmu_notifier call-
back are call. We want to tear down the peer mapping only once we know
that any device using it is gone. If all device involve use the HMM mirror
API then this can be solve easily. Otherwise it will need some change to
mmu_notifier.
Note that all of the above would rely on change to DMA-API to allow to
IOMMAP (through iommu) PCI bar address into a device IOMMU context. But
this is an orthogonal issue.
Cheers,
Jérôme
Powered by blists - more mailing lists