lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c8478a67-b14f-485c-a239-8967f1e40600@vivo.com>
Date: Mon, 15 Jul 2024 17:07:37 +0800
From: Lei Liu <liulei.rjpt@...o.com>
To: Christian König <christian.koenig@....com>,
 "T.J. Mercier" <tjmercier@...gle.com>
Cc: Sumit Semwal <sumit.semwal@...aro.org>,
 Benjamin Gaignard <benjamin.gaignard@...labora.com>,
 Brian Starkey <Brian.Starkey@....com>, John Stultz <jstultz@...gle.com>,
 Andrew Morton <akpm@...ux-foundation.org>,
 David Hildenbrand <david@...hat.com>, Matthew Wilcox <willy@...radead.org>,
 Muhammad Usama Anjum <usama.anjum@...labora.com>,
 Andrei Vagin <avagin@...gle.com>, Ryan Roberts <ryan.roberts@....com>,
 Kefeng Wang <wangkefeng.wang@...wei.com>, linux-media@...r.kernel.org,
 dri-devel@...ts.freedesktop.org, linaro-mm-sig@...ts.linaro.org,
 linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
 linux-mm@...ck.org, Daniel Vetter <daniel@...ll.ch>,
 "Vetter, Daniel" <daniel.vetter@...el.com>, opensource.kernel@...o.com,
 quic_sukadev@...cinc.com, quic_cgoldswo@...cinc.com,
 Akilesh Kailash <akailash@...gle.com>
Subject: Re: [PATCH 0/2] Support direct I/O read and write for memory
 allocated by dmabuf


On 2024/7/11 22:25, Christian König wrote:
> Am 10.07.24 um 18:34 schrieb T.J. Mercier:
>> On Wed, Jul 10, 2024 at 8:08 AM Lei Liu <liulei.rjpt@...o.com> wrote:
>>> on 2024/7/10 22:48, Christian König wrote:
>>>> Am 10.07.24 um 16:35 schrieb Lei Liu:
>>>>> on 2024/7/10 22:14, Christian König wrote:
>>>>>> Am 10.07.24 um 15:57 schrieb Lei Liu:
>>>>>>> Use vm_insert_page to establish a mapping for the memory allocated
>>>>>>> by dmabuf, thus supporting direct I/O read and write; and fix the
>>>>>>> issue of incorrect memory statistics after mapping dmabuf memory.
>>>>>> Well big NAK to that! Direct I/O is intentionally disabled on 
>>>>>> DMA-bufs.
>>>>> Hello! Could you explain why direct_io is disabled on DMABUF? Is
>>>>> there any historical reason for this?
>>>> It's basically one of the most fundamental design decision of DMA-Buf.
>>>> The attachment/map/fence model DMA-buf uses is not really compatible
>>>> with direct I/O on the underlying pages.
>>> Thank you! Is there any related documentation on this? I would like to
>>> understand and learn more about the fundamental reasons for the lack of
>>> support.
>> Hi Lei and Christian,
>>
>> This is now the third request I've seen from three different companies
>> who are interested in this,
>
> Yeah, completely agree. This is a re-occurring pattern :)
>
> Maybe we should document the preferred solution for that.
>
>> but the others are not for reasons of read
>> performance that you mention in the commit message on your first
>> patch. Someone else at Google ran a comparison between a normal read()
>> and a direct I/O read() into a preallocated user buffer and found that
>> with large readahead (16 MB) the throughput can actually be slightly
>> higher than direct I/O. If you have concerns about read performance,
>> have you tried increasing the readahead size?
>>
>> The other motivation is to load a gajillion byte file from disk into a
>> dmabuf without evicting the entire contents of pagecache while doing
>> so. Something like this (which does not currently work because read()
>> tries to GUP on the dmabuf memory as you mention):
>>
>> static int dmabuf_heap_alloc(int heap_fd, size_t len)
>> {
>>      struct dma_heap_allocation_data data = {
>>          .len = len,
>>          .fd = 0,
>>          .fd_flags = O_RDWR | O_CLOEXEC,
>>          .heap_flags = 0,
>>      };
>>      int ret = ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &data);
>>      if (ret < 0)
>>          return ret;
>>      return data.fd;
>> }
>>
>> int main(int, char **argv)
>> {
>>          const char *file_path = argv[1];
>>          printf("File: %s\n", file_path);
>>          int file_fd = open(file_path, O_RDONLY | O_DIRECT);
>>
>>          struct stat st;
>>          stat(file_path, &st);
>>          ssize_t file_size = st.st_size;
>>          ssize_t aligned_size = (file_size + 4095) & ~4095;
>>
>>          printf("File size: %zd Aligned size: %zd\n", file_size, 
>> aligned_size);
>>          int heap_fd = open("/dev/dma_heap/system", O_RDONLY);
>>          int dmabuf_fd = dmabuf_heap_alloc(heap_fd, aligned_size);
>>
>>          void *vm = mmap(nullptr, aligned_size, PROT_READ | PROT_WRITE,
>> MAP_SHARED, dmabuf_fd, 0);
>>          printf("VM at 0x%lx\n", (unsigned long)vm);
>>
>>          dma_buf_sync sync_flags { DMA_BUF_SYNC_START |
>> DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE };
>>          ioctl(dmabuf_fd, DMA_BUF_IOCTL_SYNC, &sync_flags);
>>
>>          ssize_t rc = read(file_fd, vm, file_size);
>>          printf("Read: %zd %s\n", rc, rc < 0 ? strerror(errno) : "");
>>
>>          sync_flags.flags = DMA_BUF_SYNC_END | DMA_BUF_SYNC_READ |
>> DMA_BUF_SYNC_WRITE;
>>          ioctl(dmabuf_fd, DMA_BUF_IOCTL_SYNC, &sync_flags);
>> }
>>
>> Or replace the mmap() + read() with sendfile().
>
> Or copy_file_range(). That's pretty much exactly what I suggested on 
> the other mail thread around that topic as well.

Thank you for your suggestion. I will study the method you suggested 
with Yang. Using copy_file_range() might be a good solution approach.

Regards,
Lei Liu.

>
>> So I would also like to see the above code (or something else similar)
>> be able to work and I understand some of the reasons why it currently
>> does not, but I don't understand why we should actively prevent this
>> type of behavior entirely.
>
> +1
>
> Regards,
> Christian.
>
>>
>> Best,
>> T.J.
>>
>>
>>
>>
>>
>>
>>
>>
>>>>>> We already discussed enforcing that in the DMA-buf framework and
>>>>>> this patch probably means that we should really do that.
>>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>> Thank you for your response. With the application of AI large model
>>>>> edgeification, we urgently need support for direct_io on DMABUF to
>>>>> read some very large files. Do you have any new solutions or plans
>>>>> for this?
>>>> We have seen similar projects over the years and all of those turned
>>>> out to be complete shipwrecks.
>>>>
>>>> There is currently a patch set under discussion to give the network
>>>> subsystem DMA-buf support. If you are interest in network direct I/O
>>>> that could help.
>>> Is there a related introduction link for this patch?
>>>
>>>> Additional to that a lot of GPU drivers support userptr usages, e.g.
>>>> to import malloced memory into the GPU driver. You can then also do
>>>> direct I/O on that malloced memory and the kernel will enforce correct
>>>> handling with the GPU driver through MMU notifiers.
>>>>
>>>> But as far as I know a general DMA-buf based solution isn't possible.
>>> 1.The reason we need to use DMABUF memory here is that we need to share
>>> memory between the CPU and APU. Currently, only DMABUF memory is
>>> suitable for this purpose. Additionally, we need to read very large 
>>> files.
>>>
>>> 2. Are there any other solutions for this? Also, do you have any plans
>>> to support direct_io for DMABUF memory in the future?
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> Regards,
>>>>> Lei Liu.
>>>>>
>>>>>>> Lei Liu (2):
>>>>>>>     mm: dmabuf_direct_io: Support direct_io for memory allocated by
>>>>>>> dmabuf
>>>>>>>     mm: dmabuf_direct_io: Fix memory statistics error for dmabuf
>>>>>>> allocated
>>>>>>>       memory with direct_io support
>>>>>>>
>>>>>>>    drivers/dma-buf/heaps/system_heap.c |  5 +++--
>>>>>>>    fs/proc/task_mmu.c                  |  8 +++++++-
>>>>>>>    include/linux/mm.h                  |  1 +
>>>>>>>    mm/memory.c                         | 15 ++++++++++-----
>>>>>>>    mm/rmap.c                           |  9 +++++----
>>>>>>>    5 files changed, 26 insertions(+), 12 deletions(-)
>>>>>>>
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ