lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250516092148.12778-1-tao.wangtao@honor.com>
Date: Fri, 16 May 2025 17:21:46 +0800
From: wangtao <tao.wangtao@...or.com>
To: <sumit.semwal@...aro.org>, <christian.koenig@....com>,
	<benjamin.gaignard@...labora.com>, <Brian.Starkey@....com>,
	<jstultz@...gle.com>, <tjmercier@...gle.com>
CC: <linux-media@...r.kernel.org>, <dri-devel@...ts.freedesktop.org>,
	<linaro-mm-sig@...ts.linaro.org>, <linux-kernel@...r.kernel.org>,
	<bintian.wang@...or.com>, <yipengxiang@...or.com>, <liulu.liu@...or.com>,
	<feng.han@...or.com>, wangtao <tao.wangtao@...or.com>
Subject: [PATCH v2 0/2] dma-buf: Add direct I/O support via DMA_BUF_IOCTL_RW_FILE

Introduce DMA_BUF_IOCTL_RW_FILE ioctl for direct file I/O on dma-buf objects.

Current flow:
1. Allocate dma-buf (buf_fd)       # Get buffer descriptor
2. Map memory (vaddr)              # Access via virtual address
3. File ops: open/lseek/read       # Read into mapped memory

Problem:
- No direct I/O support in dmabuf
- 70% read time spent on page cache & memcpy
- High latency/power with buffer I/O

Solution:
Add rw_file callback in exporter. When holding sgtable exclusively:
- Build bio_vec and set IOCB_DIRECT flag
- Use vfs_iocb_iter_read for direct I/O

Improved usage:
    dmabuf_fd = dmabuf_alloc(len, heap_fd)
    file_fd = open(file_path, O_RDONLY)
    if (direct_io) arg.flags |= DMA_BUF_RW_FLAGS_DIRECT
    ioctl(dmabuf_fd, DMA_BUF_IOCTL_RW_FILE, &arg)

Performance gains:
- Throughput: 1032MB/s -> 3776MB/s (UFS4.0 @4GB/s)
- Zero page cache overhead
- Direct path eliminates memory copies

Use cases:
- AI model loading
- Real-time data streaming
- Task snapshot storage

vs udmabuf:
- udmabuf creation slower
- udmabuf direct I/O slower than dmabuf direct I/O
- sendfile still has 1 copy vs dmabuf's zero-copy

Test (32x32MB buffer, 1GB file, UFS @4GB/s, CPU @1GHZ):
Metric                 | alloc (ms) | read (ms) | total (ms)
-----------------------|------------|-----------|-----------
udmabuf buffer read    | 539        | 2017      | 2555
udmabuf direct read    | 522        | 658       | 1179
udmabuf buffer sendfile| 505        | 1040      | 1546
udmabuf direct sendfile| 510        | 2269      | 2780
dmabuf buffer read     | 51         | 1068      | 1118
patch 1-2 direct read  | 52         | 297       | 349

v1: https://lore.kernel.org/all/20250513092803.2096-1-tao.wangtao@honor.com
v1 -> v2:
  Dma-buf exporter verify exclusive access to the dmabuf's sgtable.

wangtao (2):
  dmabuf: add DMA_BUF_IOCTL_RW_FILE
  dmabuf/heaps: implement DMA_BUF_IOCTL_RW_FILE for system_heap

 drivers/dma-buf/dma-buf.c           |   8 ++
 drivers/dma-buf/heaps/system_heap.c | 121 ++++++++++++++++++++++++++++
 include/linux/dma-buf.h             |  15 ++++
 include/uapi/linux/dma-buf.h        |  28 +++++++
 4 files changed, 172 insertions(+)

-- 
2.17.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ