[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170612184413.GA5924@gmail.com>
Date: Mon, 12 Jun 2017 14:44:14 -0400
From: Jerome Glisse <j.glisse@...il.com>
To: "Wuzongyong (Cordius Wu, Euler Dept)" <wuzongyong1@...wei.com>
Cc: "iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"oded.gabbay@....com" <oded.gabbay@....com>,
"Wanzongshun (Vincent)" <wanzongshun@...wei.com>
Subject: Re: What differences and relations between SVM, HSA, HMM and Unified
Memory?
On Sat, Jun 10, 2017 at 04:06:28AM +0000, Wuzongyong (Cordius Wu, Euler Dept) wrote:
> Hi,
>
> Could someone explain differences and relations between the SVM
> (Shared Virtual Memory, by Intel), HSA(Heterogeneous System
> Architecture, by AMD), HMM(Heterogeneous Memory Management, by Glisse)
> and UM(Unified Memory, by NVIDIA) ? Are these in the substitutional
> relation?
>
> As I understand it, these aim to solve the same thing, sharing
> pointers between CPU and GPU(implement with ATS/PASID/PRI/IOMMU
> support). So far, SVM and HSA can only be used by integrated gpu.
> And, Intel declare that the root ports doesn't not have the
> required TLP prefix support, resulting that SVM can't be used
> by discrete devices. So could someone tell me the required TLP
> prefix means what specifically?
>
> With HMM, we can use allocator like malloc to manage host and
> device memory. Does this mean that there is no need to use SVM
> and HSA with HMM, or HMM is the basis of SVM and HAS to
> implement Fine-Grained system SVM defined in the opencl spec?
So aim of all technology is to share address space between a device
and CPU. Now they are 3 way to do it:
A) all in hardware like CAPI or CCIX where device memory is cache
coherent from CPU access point of view and system memory is also
accessible by device in cache coherent way with CPU. So it is
cache coherency going both way from CPU to device memory and from
device to system memory
B) partially in hardware ATS/PASID (which are the same technology
behind both HSA and SVM). Here it is only single way solution
where you have cache coherent access from device to system memory
but not the other way around. Moreover you share the CPU page
table with the device so you do not need to program the IOMMU.
Here you can not use the device memory transparently. At least
not without software help like HMM.
C) all in software. Here device can access system memory with cache
coherency but it does not share the same CPU page table. Each
device have their own page table and thus you need to synchronize
them.
HMM provides helper that address all of the 3 solutions.
A) for all hardware solution HMM provides new helpers to help
with migration of process memory to device memory
B) for partial hardware solution you can mix with HMM to again
provide helpers for migration to device memory. This assume
you device can mix and match local device page table with
ATS/PASID region
C) full software solution using all the feature of HMM where it
is all done in software and HMM is just doing the heavy lifting
on behalf of device driver
In all of the above we are talking fine-grained system SVM as in
the OpenCL specificiation. So you can malloc() memory and use it
directly from the GPU.
Hope this clarify thing.
Cheers,
Jérôme
Powered by blists - more mailing lists