linux-kernel - RE: [HMM 00/15] HMM (Heterogeneous Memory Management) v23

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <BN6PR12MB1348BFA811E8A5539EBD9056E8C10@BN6PR12MB1348.namprd12.prod.outlook.com>
Date:   Fri, 16 Jun 2017 17:55:52 +0000
From:   "Bridgman, John" <John.Bridgman@....com>
To:     Jerome Glisse <jglisse@...hat.com>
CC:     "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        Dan Williams <dan.j.williams@...el.com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        John Hubbard <jhubbard@...dia.com>,
        "Sander, Ben" <ben.sander@....com>,
        "Kuehling, Felix" <Felix.Kuehling@....com>
Subject: RE: [HMM 00/15] HMM (Heterogeneous Memory Management) v23

>-----Original Message-----
>From: Jerome Glisse [mailto:jglisse@...hat.com]
>Sent: Friday, June 16, 2017 10:48 AM
>To: Bridgman, John
>Cc: akpm@...ux-foundation.org; linux-kernel@...r.kernel.org; linux-
>mm@...ck.org; Dan Williams; Kirill A . Shutemov; John Hubbard; Sander, Ben;
>Kuehling, Felix
>Subject: Re: [HMM 00/15] HMM (Heterogeneous Memory Management) v23
>
>On Fri, Jun 16, 2017 at 07:22:05AM +0000, Bridgman, John wrote:
>> Hi Jerome,
>>
>> I'm just getting back to this; sorry for the late responses.
>>
>> Your description of HMM talks about blocking CPU accesses when a page
>> has been migrated to device memory, and you treat that as a "given" in
>> the HMM design. Other than BAR limits, coherency between CPU and
>> device caches and performance on read-intensive CPU accesses to device
>> memory are there any other reasons for this ?
>
>Correct this is the list of reasons for it. Note that HMM is more of a toolboox
>that one monolithic thing. For instance you also have the HMM-CDM patchset
>that does allow to have GPU memory map to the CPU but this rely on CAPI or
>CCIX to keep same memory model garanty.
>
>
>> The reason I'm asking is that we make fairly heavy use of large BAR
>> support which allows the CPU to directly access all of the device
>> memory on each of the GPUs, albeit without cache coherency, and there
>> are some cases where it appears that allowing CPU access to the page
>> in device memory would be more efficient than constantly migrating
>> back and forth.
>
>The thing is we are designing toward random program and we can not make
>any assumption on what kind of instruction a program might run on such
>memory. So if program try to do atomic on it iirc it is un- define what is
>suppose to happen.

Thanks... thought I was missing something from the list. Agree that we need to provide consistent behaviour, and we definitely care about atomics. If we could get consistent behaviour with the page still in device memory are you aware of any other problems related to HMM itself ? 

>
>So if you want to keep such memory mapped to userspace i would suggest
>doing it through device specific vma and thus through API specific contract
>that is well understood by the developer.
>
>>
>> Migrating the page back and forth between device system memory appears
>> at first glance to provide three benefits (albeit at a
>> cost):
>>
>> 1. BAR limit - this is kind of a no-brainer, in the sense that if
>>    the CPU can not access the VRAM then you have to migrate it
>>
>> 2. coherency - having the CPU fault when page is in device memory
>>    or vice versa gives you an event which can be used to allow cache
>>    flushing on one device before handing ownership (from a cache
>>    perspective) to the other device - but at first glance you don't
>>    actually have to move the page to get that benefit
>>
>> 3. performance - CPU writes to device memory can be pretty fast
>>    since the transfers can be "fire and forget" but reads are always
>>    going to be slow because of the round-trip nature... but the
>>    tradeoff between access performance and migration overhead is
>>    more of a heuristic thing than a black-and-white thing
>
>You are missing CPU atomic operation AFAIK it is undefine how they behave
>on BAR/IO memory.
>
>
>> Do you see any HMM-related problems in principle with optionally
>> leaving a page in device memory while the CPU is accessing it assuming
>> that only one CPU/device "owns" the page from a cache POV at any given
>> time ?
>
>The problem i see is with breaking assumption in respect to the memory
>model the programmer have. So let say you have program A that use a library
>L and that library is clever enough to use the GPU and that GPU driver use
>HMM. Now if L migrate some memory behind the back of the program to
>perform some computation you do not want to break any of the assumption
>made by the programmer of A.
>
>So like i said above if you want to keep a live mapping of some memory i
>would do it through device specific API. The whole point of HMM is to make
>memory migration transparent without breaking any of the expectation you
>have about how memory access works from CPU point of view.
>
>Cheers,
>Jérôme