linux-kernel - Re: Interacting with coherent memory on external devices

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <553A8E62.4060802@amd.com>
Date:	Fri, 24 Apr 2015 21:41:38 +0300
From:	Oded Gabbay <oded.gabbay@....com>
To:	Jerome Glisse <j.glisse@...il.com>,
	Christoph Lameter <cl@...ux.com>
CC:	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	<paulmck@...ux.vnet.ibm.com>, <linux-kernel@...r.kernel.org>,
	<linux-mm@...ck.org>, <jglisse@...hat.com>, <mgorman@...e.de>,
	<aarcange@...hat.com>, <riel@...hat.com>, <airlied@...hat.com>,
	<aneesh.kumar@...ux.vnet.ibm.com>,
	Cameron Buschardt <cabuschardt@...dia.com>,
	Mark Hairgrove <mhairgrove@...dia.com>,
	"Geoffrey Gerfin" <ggerfin@...dia.com>,
	John McKenna <jmckenna@...dia.com>,
	<akpm@...ux-foundation.org>,
	"Bridgman, John" <John.Bridgman@....com>
Subject: Re: Interacting with coherent memory on external devices



On 04/23/2015 07:22 PM, Jerome Glisse wrote:
> On Thu, Apr 23, 2015 at 09:20:55AM -0500, Christoph Lameter wrote:
>> On Thu, 23 Apr 2015, Benjamin Herrenschmidt wrote:
>>
>>>> There are hooks in glibc where you can replace the memory
>>>> management of the apps if you want that.
>>>
>>> We don't control the app. Let's say we are doing a plugin for libfoo
>>> which accelerates "foo" using GPUs.
>>
>> There are numerous examples of malloc implementation that can be used for
>> apps without modifying the app.
>
> What about share memory pass btw process ? Or mmaped file ? Or
> a library that is loaded through dlopen and thus had no way to
> control any allocation that happen before it became active ?
>
>>>
>>> Now some other app we have no control on uses libfoo. So pointers
>>> already allocated/mapped, possibly a long time ago, will hit libfoo (or
>>> the plugin) and we need GPUs to churn on the data.
>>
>> IF the GPU would need to suspend one of its computation thread to wait on
>> a mapping to be established on demand or so then it looks like the
>> performance of the parallel threads on a GPU will be significantly
>> compromised. You would want to do the transfer explicitly in some fashion
>> that meshes with the concurrent calculation in the GPU. You do not want
>> stalls while GPU number crunching is ongoing.
>
> You do not understand how GPU works. GPU have a pools of thread, and they
> always try to have the pool as big as possible so that when a group of
> thread is waiting for some memory access, there are others thread ready
> to perform some operation. GPU are about hidding memory latency that's
> what they are good at. But they only achieve that when they have more
> thread in flight than compute unit. The whole thread scheduling is done
> by hardware and barely control by the device driver.
>
> So no having the GPU wait for a page fault is not as dramatic as you
> think. If you use GPU as they are intended to use you might even never
> notice the pagefault and reach close to the theoritical throughput of
> the GPU nonetheless.
>
>
>>
>>> The point I'm making is you are arguing against a usage model which has
>>> been repeatedly asked for by large amounts of customer (after all that's
>>> also why HMM exists).
>>
>> I am still not clear what is the use case for this would be. Who is asking
>> for this?
>
> Everyone but you ? OpenCL 2.0 specific request it and have several level
> of support about transparent address space. The lowest one is the one
> implemented today in which application needs to use a special memory
> allocator.
>
> The most advance one imply integration with the kernel in which any
> memory (mmaped file, share memory or anonymous memory) can be use by
> the GPU and does not need to come from a special allocator.
>
> Everyone in the industry is moving toward the most advance one. That
> is the raison d'être of HMM, to provide this functionality on hw
> platform that do not have things such as CAPI. Which is x86/arm.
>
> So use case is all application using OpenCL or Cuda. So pretty much
> everyone doing GPGPU wants this. I dunno how you can't see that.
> Share address space is so much easier. Believe it or not most coders
> do not have deep knowledge of how things work and if you can remove
> the complexity of different memory allocation and different address
> space from them they will be happy.
>
> Cheers,
> Jérôme
I second what Jerome said, and add that one of the key features of HSA 
is the ptr-is-a-ptr scheme, where the applications do *not* need to 
handle different address spaces. Instead, all the memory is seen as a 
unified address space.

See slide 6 on the following presentation:
http://www.slideshare.net/hsafoundation/hsa-overview

Thanks,
	Oded
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@...ck.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/