linux-kernel - Re: Enabling peer to peer device transactions for PCIe devices

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Fri, 25 Nov 2016 12:16:30 -0500
From:   Serguei Sagalovitch <serguei.sagalovitch@....com>
To:     Christian König <christian.koenig@....com>,
        "Jason Gunthorpe" <jgunthorpe@...idianresearch.com>,
        Logan Gunthorpe <logang@...tatee.com>
CC:     Dan Williams <dan.j.williams@...el.com>,
        "Deucher, Alexander" <Alexander.Deucher@....com>,
        "linux-nvdimm@...ts.01.org" <linux-nvdimm@...1.01.org>,
        "linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
        "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        "Kuehling, Felix" <Felix.Kuehling@....com>,
        "Bridgman, John" <John.Bridgman@....com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "dri-devel@...ts.freedesktop.org" <dri-devel@...ts.freedesktop.org>,
        "Sander, Ben" <ben.sander@....com>,
        "Suthikulpanit, Suravee" <Suravee.Suthikulpanit@....com>,
        "Blinzer, Paul" <Paul.Blinzer@....com>,
        "Linux-media@...r.kernel.org" <Linux-media@...r.kernel.org>,
        Haggai Eran <haggaie@...lanox.com>
Subject: Re: Enabling peer to peer device transactions for PCIe devices

On 2016-11-25 08:22 AM, Christian König wrote:
>
>> Serguei, what is your plan in GPU land for migration? Ie if I have a
>> CPU mapped page and the GPU moves it to VRAM, it becomes non-cachable
>> - do you still allow the CPU to access it? Or do you swap it back to
>> cachable memory if the CPU touches it?
>
> Depends on the policy in command, but currently it's the other way 
> around most of the time.
>
> E.g. we allocate memory in VRAM, the CPU writes to it WC and avoids 
> reading because that is slow, the GPU in turn can access it with full 
> speed.
>
> When we run out of VRAM we move those allocations to system memory and 
> update both the CPU as well as the GPU page tables.
>
> So that move is transparent for both userspace as well as shaders 
> running on the GPU.
I would like to add more in relation to  CPU access :

a) we could have CPU-accessible part of VRAM ("inside" of PCIe BAR register)
and non-CPU  accessible part.  As the result if user needs to have
CPU access than memory should be located in CPU-accessible part
of VRAM or in system memory.

Application/user mode driver could specify preference/hints of
locations based on their assumption / knowledge about access
patterns requirements, game resolution,  knowledge
about size of VRAM memory, etc.  So if CPU access performance
is critical then such memory should be allocated in system memory
as  the first (and may be only) choice.

b) Allocation may not  have CPU address  at all - only GPU one.
Also we may not be able to have CPU address/accesses for all VRAM
memory but memory may still  be migrated in any case unrelated
if we have CPU address or not.

c) " VRAM, it becomes non-cachable "
Strictly speaking VRAM is configured as WC (write-combined memory) to
provide fast CPU write access. Also it was found that sometimes if CPU
access is not critical from performance perspective it may be useful
to allocate/program system memory also as WC to  avoid needs for
extra "snooping" to synchronize with CPU caches during GPU access.
So potentially system memory could be WC too.