linux-kernel - Re: Enabling peer to peer device transactions for PCIe devices

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPcyv4ind0fxek7g25MX=49rDfT5X151tb4=TYudMBmUJFZZNQ@mail.gmail.com>
Date:   Tue, 22 Nov 2016 13:21:03 -0800
From:   Dan Williams <dan.j.williams@...el.com>
To:     Daniel Vetter <daniel@...ll.ch>
Cc:     Serguei Sagalovitch <serguei.sagalovitch@....com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "linux-nvdimm@...ts.01.org" <linux-nvdimm@...ts.01.org>,
        "linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
        "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        "Kuehling, Felix" <Felix.Kuehling@....com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "dri-devel@...ts.freedesktop.org" <dri-devel@...ts.freedesktop.org>,
        "Koenig, Christian" <Christian.Koenig@....com>,
        "Sander, Ben" <ben.sander@....com>,
        "Suthikulpanit, Suravee" <Suravee.Suthikulpanit@....com>,
        "Deucher, Alexander" <Alexander.Deucher@....com>,
        "Blinzer, Paul" <Paul.Blinzer@....com>,
        "Linux-media@...r.kernel.org" <Linux-media@...r.kernel.org>
Subject: Re: Enabling peer to peer device transactions for PCIe devices

On Tue, Nov 22, 2016 at 1:03 PM, Daniel Vetter <daniel@...ll.ch> wrote:
> On Tue, Nov 22, 2016 at 9:35 PM, Serguei Sagalovitch
> <serguei.sagalovitch@....com> wrote:
>>
>> On 2016-11-22 03:10 PM, Daniel Vetter wrote:
>>>
>>> On Tue, Nov 22, 2016 at 9:01 PM, Dan Williams <dan.j.williams@...el.com>
>>> wrote:
>>>>
>>>> On Tue, Nov 22, 2016 at 10:59 AM, Serguei Sagalovitch
>>>> <serguei.sagalovitch@....com> wrote:
>>>>>
>>>>> I personally like "device-DAX" idea but my concerns are:
>>>>>
>>>>> -  How well it will co-exists with the  DRM infrastructure /
>>>>> implementations
>>>>>     in part dealing with CPU pointers?
>>>>
>>>> Inside the kernel a device-DAX range is "just memory" in the sense
>>>> that you can perform pfn_to_page() on it and issue I/O, but the vma is
>>>> not migratable. To be honest I do not know how well that co-exists
>>>> with drm infrastructure.
>>>>
>>>>> -  How well we will be able to handle case when we need to
>>>>> "move"/"evict"
>>>>>     memory/data to the new location so CPU pointer should point to the
>>>>> new
>>>>> physical location/address
>>>>>      (and may be not in PCI device memory at all)?
>>>>
>>>> So, device-DAX deliberately avoids support for in-kernel migration or
>>>> overcommit. Those cases are left to the core mm or drm. The device-dax
>>>> interface is for cases where all that is needed is a direct-mapping to
>>>> a statically-allocated physical-address range be it persistent memory
>>>> or some other special reserved memory range.
>>>
>>> For some of the fancy use-cases (e.g. to be comparable to what HMM can
>>> pull off) I think we want all the magic in core mm, i.e. migration and
>>> overcommit. At least that seems to be the very strong drive in all
>>> general-purpose gpu abstractions and implementations, where memory is
>>> allocated with malloc, and then mapped/moved into vram/gpu address
>>> space through some magic,
>>
>> It is possible that there is other way around: memory is requested to be
>> allocated and should be kept in vram for  performance reason but due
>> to possible overcommit case we need at least temporally to "move" such
>> allocation to system memory.
>
> With migration I meant migrating both ways of course. And with stuff
> like numactl we can also influence where exactly the malloc'ed memory
> is allocated originally, at least if we'd expose the vram range as a
> very special numa node that happens to be far away and not hold any
> cpu cores.

I don't think we should be using numa distance to reverse engineer a
certain allocation behavior.  The latency data should be truthful, but
you're right we'll need a mechanism to keep general purpose
allocations out of that range by default. Btw, strict isolation is
another design point of device-dax, but I think in this case we're
describing something between the two extremes of full isolation and
full compatibility with existing numactl apis.