linux-kernel - Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export FD for DMA-BUF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d600a638-9e55-6249-b574-0986cd5cea1e@gmail.com>
Date:   Wed, 23 Jun 2021 10:57:35 +0200
From:   Christian König <ckoenig.leichtzumerken@...il.com>
To:     Jason Gunthorpe <jgg@...pe.ca>,
        Christian König <christian.koenig@....com>
Cc:     Oded Gabbay <oded.gabbay@...il.com>,
        Gal Pressman <galpress@...zon.com>, sleybo@...zon.com,
        linux-rdma <linux-rdma@...r.kernel.org>,
        Oded Gabbay <ogabbay@...nel.org>,
        Christoph Hellwig <hch@....de>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        dri-devel <dri-devel@...ts.freedesktop.org>,
        "moderated list:DMA BUFFER SHARING FRAMEWORK" 
        <linaro-mm-sig@...ts.linaro.org>,
        Doug Ledford <dledford@...hat.com>,
        Tomer Tayar <ttayar@...ana.ai>,
        amd-gfx list <amd-gfx@...ts.freedesktop.org>,
        Greg KH <gregkh@...uxfoundation.org>,
        Alex Deucher <alexander.deucher@....com>,
        Leon Romanovsky <leonro@...dia.com>,
        "open list:DMA BUFFER SHARING FRAMEWORK" 
        <linux-media@...r.kernel.org>
Subject: Re: [Linaro-mm-sig] [PATCH v3 1/2] habanalabs: define uAPI to export
 FD for DMA-BUF

Am 22.06.21 um 18:05 schrieb Jason Gunthorpe:
> On Tue, Jun 22, 2021 at 05:48:10PM +0200, Christian König wrote:
>> Am 22.06.21 um 17:40 schrieb Jason Gunthorpe:
>>> On Tue, Jun 22, 2021 at 05:29:01PM +0200, Christian König wrote:
>>>> [SNIP]
>>>> No absolutely not. NVidia GPUs work exactly the same way.
>>>>
>>>> And you have tons of similar cases in embedded and SoC systems where
>>>> intermediate memory between devices isn't directly addressable with the CPU.
>>> None of that is PCI P2P.
>>>
>>> It is all some specialty direct transfer.
>>>
>>> You can't reasonably call dma_map_resource() on non CPU mapped memory
>>> for instance, what address would you pass?
>>>
>>> Do not confuse "I am doing transfers between two HW blocks" with PCI
>>> Peer to Peer DMA transfers - the latter is a very narrow subcase.
>>>
>>>> No, just using the dma_map_resource() interface.
>>> Ik, but yes that does "work". Logan's series is better.
>> No it isn't. It makes devices depend on allocating struct pages for their
>> BARs which is not necessary nor desired.
> Which dramatically reduces the cost of establishing DMA mappings, a
> loop of dma_map_resource() is very expensive.

Yeah, but that is perfectly ok. Our BAR allocations are either in chunks 
of at least 2MiB or only a single 4KiB page.

Oded might run into more performance problems, but those DMA-buf 
mappings are usually set up only once.

>> How do you prevent direct I/O on those pages for example?
> GUP fails.

At least that is calming.

>> Allocating a struct pages has their use case, for example for exposing VRAM
>> as memory for HMM. But that is something very specific and should not limit
>> PCIe P2P DMA in general.
> Sure, but that is an ideal we are far from obtaining, and nobody wants
> to work on it prefering to do hacky hacky like this.
>
> If you believe in this then remove the scatter list from dmabuf, add a
> new set of dma_map* APIs to work on physical addresses and all the
> other stuff needed.

Yeah, that's what I totally agree on. And I actually hoped that the new 
P2P work for PCIe would go into that direction, but that didn't 
materialized.

But allocating struct pages for PCIe BARs which are essentially 
registers and not memory is much more hacky than the dma_resource_map() 
approach.

To re-iterate why I think that having struct pages for those BARs is a 
bad idea: Our doorbells on AMD GPUs are write and read pointers for ring 
buffers.

When you write to the BAR you essentially tell the firmware that you 
have either filled the ring buffer or read a bunch of it. This in turn 
then triggers an interrupt in the hardware/firmware which was eventually 
asleep.

By using PCIe P2P we want to avoid the round trip to the CPU when one 
device has filled the ring buffer and another device must be woken up to 
process it.

Think of it as MSI-X in reverse and allocating struct pages for those 
BARs just to work around the shortcomings of the DMA API makes no sense 
at all to me.

We also do have the VRAM BAR, and for HMM we do allocate struct pages 
for the address range exposed there. But this is a different use case.

Regards,
Christian.

>
> Otherwise, we have what we have and drivers don't get to opt out. This
> is why the stuff in AMDGPU was NAK'd.
>
> Jason