linux-kernel - Re: Enabling peer to peer device transactions for PCIe devices

Open Source and information security mailing list archives

Message-ID: <583BC14B.2040809@amd.com>
Date:   Mon, 28 Nov 2016 13:31:55 +0800
From:   zhoucm1 <david1.zhou@....com>
To:     Christian König <christian.koenig@....com>,
        "Haggai Eran" <haggaie@...lanox.com>,
        Jason Gunthorpe <jgunthorpe@...idianresearch.com>,
        "Yu, Qiang" <Qiang.Yu@....com>
CC:     "linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
        "linux-nvdimm@...ts.01.org" <linux-nvdimm@...1.01.org>,
        "Kuehling, Felix" <Felix.Kuehling@....com>,
        Serguei Sagalovitch <serguei.sagalovitch@....com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "dri-devel@...ts.freedesktop.org" <dri-devel@...ts.freedesktop.org>,
        "Blinzer, Paul" <Paul.Blinzer@....com>,
        "Suthikulpanit, Suravee" <Suravee.Suthikulpanit@....com>,
        "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        "Deucher, Alexander" <Alexander.Deucher@....com>,
        Max Gurtovoy <maxg@...lanox.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Logan Gunthorpe <logang@...tatee.com>,
        "Sander, Ben" <ben.sander@....com>,
        "Linux-media@...r.kernel.org" <Linux-media@...r.kernel.org>
Subject: Re: Enabling peer to peer device transactions for PCIe devices

+Qiang, who is working on it.

On 2016年11月27日 22:07, Christian König wrote:
> Am 27.11.2016 um 15:02 schrieb Haggai Eran:
>> On 11/25/2016 9:32 PM, Jason Gunthorpe wrote:
>>> On Fri, Nov 25, 2016 at 02:22:17PM +0100, Christian König wrote:
>>>
>>>>> Like you say below we have to handle short lived in the usual way, 
>>>>> and
>>>>> that covers basically every device except IB MRs, including the
>>>>> command queue on a NVMe drive.
>>>> Well a problem which wasn't mentioned so far is that while GPUs do 
>>>> have a
>>>> page table to mirror the CPU page table, they usually can't recover 
>>>> from
>>>> page faults.
>>>> So what we do is making sure that all memory accessed by the GPU 
>>>> Jobs stays
>>>> in place while those jobs run (pretty much the same pinning you do 
>>>> for the
>>>> DMA).
>>> Yes, it is DMA, so this is a valid approach.
>>>
>>> But, you don't need page faults from the GPU to do proper coherent
>>> page table mirroring. Basically when the driver submits the work to
>>> the GPU it 'faults' the pages into the CPU and mirror translation
>>> table (instead of pinning).
>>>
>>> Like in ODP, MMU notifiers/HMM are used to monitor for translation
>>> changes. If a change comes in the GPU driver checks if an executing
>>> command is touching those pages and blocks the MMU notifier until the
>>> command flushes, then unfaults the page (blocking future commands) and
>>> unblocks the mmu notifier.
>> I think blocking mmu notifiers against something that is basically
>> controlled by user-space can be problematic. This can block things like
>> memory reclaim. If you have user-space access to the device's queues,
>> user-space can block the mmu notifier forever.
> Really good point.
>
> I think this means the bare minimum if we don't have recoverable page 
> faults is to have preemption support like Felix described in his 
> answer as well.
>
> Going to keep that in mind,
> Christian.
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@...ts.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives