linux-kernel - Re: [PATCH 00/15] HMM (Heterogeneous Memory Management) v24

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <40b9d534-4809-0a84-27dc-5c3faee3f69c@huawei.com>
Date:   Fri, 21 Jul 2017 17:03:22 +0800
From:   Yisheng Xie <xieyisheng1@...wei.com>
To:     Jerome Glisse <jglisse@...hat.com>
CC:     <akpm@...ux-foundation.org>, <linux-kernel@...r.kernel.org>,
        <linux-mm@...ck.org>, John Hubbard <jhubbard@...dia.com>,
        Dan Williams <dan.j.williams@...el.com>,
        David Nellans <dnellans@...dia.com>
Subject: Re: [PATCH 00/15] HMM (Heterogeneous Memory Management) v24

Hi Jerome,

On 2017/7/21 1:18, Jerome Glisse wrote:
> On Wed, Jul 19, 2017 at 07:48:08PM +0800, Yisheng Xie wrote:
>> Hi Jérôme
>>
>> On 2017/6/29 2:00, Jérôme Glisse wrote:
>>>
>>> Patchset is on top of git://git.cmpxchg.org/linux-mmotm.git so i
>>> test same kernel as kbuild system, git branch:
>>>
>>> https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-v24
>>>
>>> Change since v23 is code comment fixes, simplify kernel configuration and
>>> improve allocation of new page on migration do device memory (last patch
>>> in this patchset).
>>>
>>> Everything else is the same. Below is the long description of what HMM
>>> is about and why. At the end of this email i describe briefly each patch
>>> and suggest reviewers for each of them.
>>>
>>>
>>> Heterogeneous Memory Management (HMM) (description and justification)
>>>
>>> Today device driver expose dedicated memory allocation API through their
>>> device file, often relying on a combination of IOCTL and mmap calls. The
>>> device can only access and use memory allocated through this API. This
>>> effectively split the program address space into object allocated for the
>>> device and useable by the device and other regular memory (malloc, mmap
>>> of a file, share memory, â) only accessible by CPU (or in a very limited
>>> way by a device by pinning memory).
>>>
>>> Allowing different isolated component of a program to use a device thus
>>> require duplication of the input data structure using device memory
>>> allocator. This is reasonable for simple data structure (array, grid,
>>> image, â) but this get extremely complex with advance data structure
>>> (list, tree, graph, â) that rely on a web of memory pointers. This is
>>> becoming a serious limitation on the kind of work load that can be
>>> offloaded to device like GPU.
>>>
>>> New industry standard like C++, OpenCL or CUDA are pushing to remove this
>>> barrier. This require a shared address space between GPU device and CPU so
>>> that GPU can access any memory of a process (while still obeying memory
>>> protection like read only). This kind of feature is also appearing in
>>> various other operating systems.
>>>
>>> HMM is a set of helpers to facilitate several aspects of address space
>>> sharing and device memory management. Unlike existing sharing mechanism
>>> that rely on pining pages use by a device, HMM relies on mmu_notifier to
>>> propagate CPU page table update to device page table.
>>>
>>> Duplicating CPU page table is only one aspect necessary for efficiently
>>> using device like GPU. GPU local memory have bandwidth in the TeraBytes/
>>> second range but they are connected to main memory through a system bus
>>> like PCIE that is limited to 32GigaBytes/second (PCIE 4.0 16x). Thus it
>>> is necessary to allow migration of process memory from main system memory
>>> to device memory. Issue is that on platform that only have PCIE the device
>>> memory is not accessible by the CPU with the same properties as main
>>> memory (cache coherency, atomic operations, ...).
>>>
>>> To allow migration from main memory to device memory HMM provides a set
>>> of helper to hotplug device memory as a new type of ZONE_DEVICE memory
>>> which is un-addressable by CPU but still has struct page representing it.
>>> This allow most of the core kernel logic that deals with a process memory
>>> to stay oblivious of the peculiarity of device memory.
>>>
>>> When page backing an address of a process is migrated to device memory
>>> the CPU page table entry is set to a new specific swap entry. CPU access
>>> to such address triggers a migration back to system memory, just like if
>>> the page was swap on disk. 
>>> [...]
>>> To allow efficient migration between device memory and main memory a new
>>> migrate_vma() helpers is added with this patchset. It allows to leverage
>>> device DMA engine to perform the copy operation.
>>>
>>
>> Is this means that when CPU access an address of a process is migrated to device
>> memory, it should call migrate_vma() to migrate a range of address back to CPU ?
>> If it is so, I think it should somewhere call this function in this patchset,
>> however, I do not find anywhere in this patchset call this function.
>>
>> Or am I miss anything?
> 
> There is a callback in struct dev_pagemap page_fault. Device driver will
> set that callback to a device driver function that itself might call
> migrate_vma(). It might call a different helper thought.
> 
> For instance GPU driver commonly use memory oversubscription, ie they
> evict device memory to system page to make room for other stuff. If a
> page fault happen while there is already a system page for that memory
> than the device driver might only need to hand over that page and no
> need to migrate anything.
> 
> That is why you do not see migrate_vma() call in this patchset. Calls
> to that function will be inside the individual device driver.
> 

Get your point.

Without a open source driver, it makes hard to get the whole view of this solution.
Hope can see your open source driver soon.

Thanks
Yisheng Xie