lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0d0d2a6a-a90c-409c-8d60-b17bad32af94@amd.com>
Date: Tue, 25 Nov 2025 15:21:07 +0100
From: Christian König <christian.koenig@....com>
To: Pavel Begunkov <asml.silence@...il.com>, linux-block@...r.kernel.org,
 io-uring@...r.kernel.org
Cc: Vishal Verma <vishal1.verma@...el.com>, tushar.gohad@...el.com,
 Keith Busch <kbusch@...nel.org>, Jens Axboe <axboe@...nel.dk>,
 Christoph Hellwig <hch@....de>, Sagi Grimberg <sagi@...mberg.me>,
 Alexander Viro <viro@...iv.linux.org.uk>,
 Christian Brauner <brauner@...nel.org>,
 Andrew Morton <akpm@...ux-foundation.org>,
 Sumit Semwal <sumit.semwal@...aro.org>, linux-kernel@...r.kernel.org,
 linux-nvme@...ts.infradead.org, linux-fsdevel@...r.kernel.org,
 linux-media@...r.kernel.org, dri-devel@...ts.freedesktop.org,
 linaro-mm-sig@...ts.linaro.org
Subject: Re: [RFC v2 00/11] Add dmabuf read/write via io_uring

On 11/25/25 14:52, Pavel Begunkov wrote:
> On 11/24/25 14:17, Christian König wrote:
>> On 11/24/25 12:30, Pavel Begunkov wrote:
>>> On 11/24/25 10:33, Christian König wrote:
>>>> On 11/23/25 23:51, Pavel Begunkov wrote:
>>>>> Picking up the work on supporting dmabuf in the read/write path.
>>>>
>>>> IIRC that work was completely stopped because it violated core dma_fence and DMA-buf rules and after some private discussion was considered not doable in general.
>>>>
>>>> Or am I mixing something up here?
>>>
>>> The time gap is purely due to me being busy. I wasn't CC'ed to those private
>>> discussions you mentioned, but the v1 feedback was to use dynamic attachments
>>> and avoid passing dma address arrays directly.
>>>
>>> https://lore.kernel.org/all/cover.1751035820.git.asml.silence@gmail.com/
>>>
>>> I'm lost on what part is not doable. Can you elaborate on the core
>>> dma-fence dma-buf rules?
>>
>> I most likely mixed that up, in other words that was a different discussion.
>>
>> When you use dma_fences to indicate async completion of events you need to be super duper careful that you only do this for in flight events, have the fence creation in the right order etc...
> 
> I'm curious, what can happen if there is new IO using a
> move_notify()ed mapping, but let's say it's guaranteed to complete
> strictly before dma_buf_unmap_attachment() and the fence is signaled?
> Is there some loss of data or corruption that can happen?

The problem is that you can't guarantee that because you run into deadlocks.

As soon as a dma_fence() is created and published by calling add_fence it can be memory management loops back and depends on that fence.

So you actually can't issue any new IO which might block the unmap operation.

> 
> sg_table = map_attach()         |
> move_notify()                   |
>   -> add_fence(fence)           |
>                                 | issue_IO(sg_table)
>                                 | // IO completed
> unmap_attachment(sg_table)      |
> signal_fence(fence)             |
> 
>> For example once the fence is created you can't make any memory allocations any more, that's why we have this dance of reserving fence slots, creating the fence and then adding it.
> 
> Looks I have some terminology gap here. By "memory allocations" you
> don't mean kmalloc, right? I assume it's about new users of the
> mapping.

kmalloc() as well as get_free_page() is exactly what is meant here.

You can't make any memory allocation any more after creating/publishing a dma_fence.

The usually flow is the following:

1. Lock dma_resv object
2. Prepare I/O operation, make all memory allocations etc...
3. Allocate dma_fence object
4. Push I/O operation to the HW, making sure that you don't allocate memory any more.
5. Call dma_resv_add_fence(with fence allocate in #3).
6. Unlock dma_resv object

If you stride from that you most likely end up in a deadlock sooner or later.

Regards,
Christian.

>>>> Since I don't see any dma_fence implementation at all that might actually be the case.
>>>
>>> See Patch 5, struct blk_mq_dma_fence. It's used in the move_notify
>>> callback and is signaled when all inflight IO using the current
>>> mapping are complete. All new IO requests will try to recreate the
>>> mapping, and hence potentially wait with dma_resv_wait_timeout().
>>
>> Without looking at the code that approach sounds more or less correct to me.
>>
>>>> On the other hand we have direct I/O from DMA-buf working for quite a while, just not upstream and without io_uring support.
>>>
>>> Have any reference?
>>
>> There is a WIP feature in AMDs GPU driver package for ROCm.
>>
>> But that can't be used as general purpose DMA-buf approach, because it makes use of internal knowledge about how the GPU driver is using the backing store.
> 
> Got it
> 
>> BTW when you use DMA addresses from DMA-buf always keep in mind that this memory can be written by others at the same time, e.g. you can't do things like compute a CRC first, then write to backing store and finally compare CRC.
> 
> Right. The direct IO path also works with user pages, so the
> constraints are similar in this regard.
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ