lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <48ea826a-54fe-4df0-bff9-18ae29117f57@amd.com>
Date: Tue, 27 May 2025 17:10:14 +0200
From: Christian König <christian.koenig@....com>
To: wangtao <tao.wangtao@...or.com>, "T.J. Mercier" <tjmercier@...gle.com>
Cc: "sumit.semwal@...aro.org" <sumit.semwal@...aro.org>,
 "benjamin.gaignard@...labora.com" <benjamin.gaignard@...labora.com>,
 "Brian.Starkey@....com" <Brian.Starkey@....com>,
 "jstultz@...gle.com" <jstultz@...gle.com>,
 "linux-media@...r.kernel.org" <linux-media@...r.kernel.org>,
 "dri-devel@...ts.freedesktop.org" <dri-devel@...ts.freedesktop.org>,
 "linaro-mm-sig@...ts.linaro.org" <linaro-mm-sig@...ts.linaro.org>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 "wangbintian(BintianWang)" <bintian.wang@...or.com>,
 yipengxiang <yipengxiang@...or.com>, liulu 00013167 <liulu.liu@...or.com>,
 hanfeng 00012985 <feng.han@...or.com>,
 "amir73il@...il.com" <amir73il@...il.com>,
 "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
 "viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
 "brauner@...nel.org" <brauner@...nel.org>,
 "hughd@...gle.com" <hughd@...gle.com>
Subject: Re: [PATCH 2/2] dmabuf/heaps: implement DMA_BUF_IOCTL_RW_FILE for
 system_heap

On 5/27/25 16:35, wangtao wrote:
>> -----Original Message-----
>> From: Christian König <christian.koenig@....com>
>> Sent: Thursday, May 22, 2025 7:58 PM
>> To: wangtao <tao.wangtao@...or.com>; T.J. Mercier
>> <tjmercier@...gle.com>
>> Cc: sumit.semwal@...aro.org; benjamin.gaignard@...labora.com;
>> Brian.Starkey@....com; jstultz@...gle.com; linux-media@...r.kernel.org;
>> dri-devel@...ts.freedesktop.org; linaro-mm-sig@...ts.linaro.org; linux-
>> kernel@...r.kernel.org; wangbintian(BintianWang)
>> <bintian.wang@...or.com>; yipengxiang <yipengxiang@...or.com>; liulu
>> 00013167 <liulu.liu@...or.com>; hanfeng 00012985 <feng.han@...or.com>;
>> amir73il@...il.com
>> Subject: Re: [PATCH 2/2] dmabuf/heaps: implement
>> DMA_BUF_IOCTL_RW_FILE for system_heap
>>
>> On 5/22/25 10:02, wangtao wrote:
>>>> -----Original Message-----
>>>> From: Christian König <christian.koenig@....com>
>>>> Sent: Wednesday, May 21, 2025 7:57 PM
>>>> To: wangtao <tao.wangtao@...or.com>; T.J. Mercier
>>>> <tjmercier@...gle.com>
>>>> Cc: sumit.semwal@...aro.org; benjamin.gaignard@...labora.com;
>>>> Brian.Starkey@....com; jstultz@...gle.com;
>>>> linux-media@...r.kernel.org; dri-devel@...ts.freedesktop.org;
>>>> linaro-mm-sig@...ts.linaro.org; linux- kernel@...r.kernel.org;
>>>> wangbintian(BintianWang) <bintian.wang@...or.com>; yipengxiang
>>>> <yipengxiang@...or.com>; liulu
>>>> 00013167 <liulu.liu@...or.com>; hanfeng 00012985
>>>> <feng.han@...or.com>; amir73il@...il.com
>>>> Subject: Re: [PATCH 2/2] dmabuf/heaps: implement
>>>> DMA_BUF_IOCTL_RW_FILE for system_heap
>>>>
>>>> On 5/21/25 12:25, wangtao wrote:
>>>>> [wangtao] I previously explained that
>>>>> read/sendfile/splice/copy_file_range
>>>>> syscalls can't achieve dmabuf direct IO zero-copy.
>>>>
>>>> And why can't you work on improving those syscalls instead of
>>>> creating a new IOCTL?
>>>>
>>> [wangtao] As I mentioned in previous emails, these syscalls cannot
>>> achieve dmabuf zero-copy due to technical constraints.
>>
>> Yeah, and why can't you work on removing those technical constrains?
>>
>> What is blocking you from improving the sendfile system call or proposing a
>> patch to remove the copy_file_range restrictions?
> [wangtao] Since sendfile/splice can't eliminate CPU copies, I skipped cross-FS checks
> in copy_file_range when copying memory/disk files.

It will probably be a longer discussion, but I think that having the FS people take a look as well is clearly mandatory.

If Linus or anybody else of those maintainers then say that this isn't going to fly either we can still look into alternatives.

Thanks,
Christian.


> Will send new patches after completing shmem/udmabuf callback.
> Thank you for your attention to this issue.
> 
> UFS 4.0 device @4GB/s, Arm64 CPU @1GHz:
> | Metrics                  |Creat(us)|Close(us)| I/O(us) |I/O(MB/s)| Vs.%
> |--------------------------|---------|---------|---------|---------|-------
> | 0)    dmabuf buffer read |   46898 |    4804 | 1173661 |     914 |  100%
> | 1)   udmabuf buffer read |  593844 |  337111 | 2144681 |     500 |   54%
> | 2)     memfd buffer read |    1029 |  305322 | 2215859 |     484 |   52%
> | 3)     memfd direct read |     562 |  295239 | 1019913 |    1052 |  115%
> | 4) memfd buffer sendfile |     785 |  299026 | 1431304 |     750 |   82%
> | 5) memfd direct sendfile |     718 |  296307 | 2622270 |     409 |   44%
> | 6)   memfd buffer splice |     981 |  299694 | 1573710 |     682 |   74%
> | 7)   memfd direct splice |     890 |  302509 | 1269757 |     845 |   92%
> | 8)    memfd buffer c_f_r |      33 |    4432 |     N/A |     N/A |   N/A
> | 9)    memfd direct c_f_r |      27 |    4421 |     N/A |     N/A |   N/A
> |10) memfd buffer sendfile |  595797 |  423105 | 1242494 |     864 |   94%
> |11) memfd direct sendfile |  593758 |  357921 | 2344001 |     458 |   50%
> |12)   memfd buffer splice |  623221 |  356212 | 1117507 |     960 |  105%
> |13)   memfd direct splice |  587059 |  345484 |  857103 |    1252 |  136%
> |14)  udmabuf buffer c_f_r |   22725 |   10248 |     N/A |     N/A |   N/A
> |15)  udmabuf direct c_f_r |   20120 |    9952 |     N/A |     N/A |   N/A
> |16)   dmabuf buffer c_f_r |   46517 |    4708 |  857587 |    1252 |  136%
> |17)   dmabuf direct c_f_r |   47339 |    4661 |  284023 |    3780 |  413%
> 
>>
>> Regards,
>> Christian.
>>
>>  Could you
>>> specify the technical points, code, or principles that need
>>> optimization?
>>>
>>> Let me explain again why these syscalls can't work:
>>> 1. read() syscall
>>>    - dmabuf fops lacks read callback implementation. Even if implemented,
>>>      file_fd info cannot be transferred
>>>    - read(file_fd, dmabuf_ptr, len) with remap_pfn_range-based mmap
>>>      cannot access dmabuf_buf pages, forcing buffer-mode reads
>>>
>>> 2. sendfile() syscall
>>>    - Requires CPU copy from page cache to memory file(tmpfs/shmem):
>>>      [DISK] --DMA--> [page cache] --CPU copy--> [MEMORY file]
>>>    - CPU overhead (both buffer/direct modes involve copies):
>>>      55.08% do_sendfile
>>>     |- 55.08% do_splice_direct
>>>     |-|- 55.08% splice_direct_to_actor
>>>     |-|-|- 22.51% copy_splice_read
>>>     |-|-|-|- 16.57% f2fs_file_read_iter
>>>     |-|-|-|-|- 15.12% __iomap_dio_rw
>>>     |-|-|- 32.33% direct_splice_actor
>>>     |-|-|-|- 32.11% iter_file_splice_write
>>>     |-|-|-|-|- 28.42% vfs_iter_write
>>>     |-|-|-|-|-|- 28.42% do_iter_write
>>>     |-|-|-|-|-|-|- 28.39% shmem_file_write_iter
>>>     |-|-|-|-|-|-|-|- 24.62% generic_perform_write
>>>     |-|-|-|-|-|-|-|-|- 18.75% __pi_memmove
>>>
>>> 3. splice() requires one end to be a pipe, incompatible with regular files or
>> dmabuf.
>>>
>>> 4. copy_file_range()
>>>    - Blocked by cross-FS restrictions (Amir's commit 868f9f2f8e00)
>>>    - Even without this restriction, Even without restrictions, implementing
>>>      the copy_file_range callback in dmabuf fops would only allow dmabuf
>> read
>>> 	 from regular files. This is because copy_file_range relies on
>>> 	 file_out->f_op->copy_file_range, which cannot support dmabuf
>> write
>>> 	 operations to regular files.
>>>
>>> Test results confirm these limitations:
>>> T.J. Mercier's 1G from ext4 on 6.12.20 | read/sendfile (ms) w/ 3 >
>>> drop_caches
>>> ------------------------|-------------------
>>> udmabuf buffer read     | 1210
>>> udmabuf direct read     | 671
>>> udmabuf buffer sendfile | 1096
>>> udmabuf direct sendfile | 2340
>>>
>>> My 3GHz CPU tests (cache cleared):
>>> Method                | alloc | read  | vs. (%)
>>> -----------------------------------------------
>>> udmabuf buffer read   | 135   | 546   | 180%
>>> udmabuf direct read   | 159   | 300   | 99%
>>> udmabuf buffer sendfile | 134 | 303   | 100%
>>> udmabuf direct sendfile | 141 | 912   | 301%
>>> dmabuf buffer read    | 22    | 362   | 119%
>>> my patch direct read  | 29    | 265   | 87%
>>>
>>> My 1GHz CPU tests (cache cleared):
>>> Method                | alloc | read  | vs. (%)
>>> -----------------------------------------------
>>> udmabuf buffer read   | 552   | 2067  | 198%
>>> udmabuf direct read   | 540   | 627   | 60%
>>> udmabuf buffer sendfile | 497 | 1045  | 100% udmabuf direct sendfile |
>>> 527 | 2330  | 223%
>>> dmabuf buffer read    | 40    | 1111  | 106%
>>> patch direct read     | 44    | 310   | 30%
>>>
>>> Test observations align with expectations:
>>> 1. dmabuf buffer read requires slow CPU copies 2. udmabuf direct read
>>> achieves zero-copy but has page retrieval
>>>    latency from vaddr
>>> 3. udmabuf buffer sendfile suffers CPU copy overhead 4. udmabuf direct
>>> sendfile combines CPU copies with frequent DMA
>>>    operations due to small pipe buffers 5. dmabuf buffer read also
>>> requires CPU copies 6. My direct read patch enables zero-copy with
>>> better performance
>>>    on low-power CPUs
>>> 7. udmabuf creation time remains problematic (as you’ve noted).
>>>
>>>>> My focus is enabling dmabuf direct I/O for [regular file] <--DMA-->
>>>>> [dmabuf] zero-copy.
>>>>
>>>> Yeah and that focus is wrong. You need to work on a general solution
>>>> to the issue and not specific to your problem.
>>>>
>>>>> Any API achieving this would work. Are there other uAPIs you think
>>>>> could help? Could you recommend experts who might offer suggestions?
>>>>
>>>> Well once more: Either work on sendfile or copy_file_range or
>>>> eventually splice to make it what you want to do.
>>>>
>>>> When that is done we can discuss with the VFS people if that approach
>>>> is feasible.
>>>>
>>>> But just bypassing the VFS review by implementing a DMA-buf specific
>>>> IOCTL is a NO-GO. That is clearly not something you can do in any way.
>>> [wangtao] The issue is that only dmabuf lacks Direct I/O zero-copy
>>> support. Tmpfs/shmem already work with Direct I/O zero-copy. As
>>> explained, existing syscalls or generic methods can't enable dmabuf
>>> direct I/O zero-copy, which is why I propose adding an IOCTL command.
>>>
>>> I respect your perspective. Could you clarify specific technical
>>> aspects, code requirements, or implementation principles for modifying
>>> sendfile() or copy_file_range()? This would help advance our discussion.
>>>
>>> Thank you for engaging in this dialogue.
>>>
>>>>
>>>> Regards,
>>>> Christian.
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ