lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2ae03f22-740d-4a48-b5f3-114eef92fb29@amd.com>
Date: Fri, 16 Jan 2026 15:20:36 +0800
From: Honglei Huang <honghuan@....com>
To: Akihiko Odaki <odaki@....ci.i.u-tokyo.ac.jp>
Cc: Gurchetan Singh <gurchetansingh@...omium.org>,
 Chia-I Wu <olvaffe@...il.com>, dri-devel@...ts.freedesktop.org,
 virtualization@...ts.linux.dev, linux-kernel@...r.kernel.org,
 Honglei Huang <honglei1.huang@....com>, David Airlie <airlied@...hat.com>,
 Ray.Huang@....com, Gerd Hoffmann <kraxel@...hat.com>,
 Dmitry Osipenko <dmitry.osipenko@...labora.com>,
 Thomas Zimmermann <tzimmermann@...e.de>, Maxime Ripard <mripard@...nel.org>,
 Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
 Simona Vetter <simona@...ll.ch>
Subject: Re: [PATCH v4 0/5] virtio-gpu: Add userptr support for compute
 workloads



On 2026/1/15 17:20, Akihiko Odaki wrote:
> On 2026/01/15 16:58, Honglei Huang wrote:
>> From: Honglei Huang <honghuan@....com>
>>
>> Hello,
>>
>> This series adds virtio-gpu userptr support to enable ROCm native
>> context for compute workloads. The userptr feature allows the host to
>> directly access guest userspace memory without memcpy overhead, which is
>> essential for GPU compute performance.
>>
>> The userptr implementation provides buffer-based zero-copy memory access.
>> This approach pins guest userspace pages and exposes them to the host
>> via scatter-gather tables, enabling efficient compute operations.
> 
> This description looks identical with what 
> VIRTIO_GPU_BLOB_MEM_HOST3D_GUEST does so there should be some 
> explanation how it makes difference.
> 
> I have already pointed out this when reviewing the QEMU patches[1], but 
> I note that here too, since QEMU is just a middleman and this matter is 
> better discussed by Linux and virglrenderer developers.
> 
> [1] https://lore.kernel.org/qemu-devel/35a8add7-da49-4833-9e69- 
> d213f52c771a@....com/
> 

Thanks for raising this important point about the distinction between
VIRTGPU_BLOB_FLAG_USE_USERPTR and VIRTIO_GPU_BLOB_MEM_HOST3D_GUEST.
I might not have explained it clearly previously.

The key difference is memory ownership and lifecycle:

BLOB_MEM_HOST3D_GUEST:
   - Kernel allocates memory (drm_gem_shmem_create)
   - Userspace accesses via mmap(GEM_BO)
   - Use case: Graphics resources (Vulkan/OpenGL)

BLOB_FLAG_USE_USERPTR:
   - Userspace pre-allocates memory (malloc/mmap)
   - Kernel only get existing pages
   - Use case: Compute workloads (ROCm/CUDA) with large datasets, like
GPU needs load a big model file 10G+, UMD mmap the fd file, then give 
the mmap ptr into userspace then driver do not need a another copy.
But if the shmem is used, the userspace needs copy the file data into a 
shmem mmap ptr there is a copy overhead.

Userptr:

file -> open/mmap -> userspace ptr -> driver

shmem:

user alloc shmem ──→ mmap shmem ──→ shmem userspace ptr -> driver
                                               ↑
                                               │ copy
                                               │
file ──→ open/mmap ──→ file userptr ──────────┘


For compute workloads, this matters significantly:
   Without userptr: malloc(8GB) → alloc GEM BO → memcpy 8GB → compute → 
memcpy 8GB back
   With userptr:    malloc(8GB) → create userptr BO → compute (zero-copy)

The explicit flag serves three purposes:

1. Although both send scatter-gather entries to host. The flag makes the 
intent unambiguous.

2. Ensures consistency between flag and userptr address field.

3. Future HMM support: There is a plan to upgrade userptr implementation 
to use Heterogeneous Memory Management for better GPU coherency and 
dynamic page migration. The flag provides a clean path to future upgrade.

I understand the concern about API complexity. I'll defer to the 
virtio-gpu maintainers for the final decision on whether this design is 
acceptable or if they prefer an alternative approach.

Regards,
Honglei Huang

>>
>> Key features:
>> - Zero-copy memory access between guest userspace and host GPU
>> - Read-only and read-write userptr support
>> - Runtime feature detection via VIRTGPU_PARAM_RESOURCE_USERPTR
>> - ROCm capset support for ROCm stack integration
>> - Proper page lifecycle management with FOLL_LONGTERM pinning
>>
>> Patches overview:
>> 1. Add VIRTIO_GPU_CAPSET_ROCM capability for compute workloads
>> 2. Add virtio-gpu API definitions for userptr blob resources
>> 3. Extend DRM UAPI with comprehensive userptr support
>> 4. Implement core userptr functionality with page management
>> 5. Integrate userptr into blob resource creation and advertise to 
>> userspace
>>
>> Performance: In popular compute benchmarks, this implementation achieves
>> approximately 70% efficiency compared to bare metal OpenCL performance on
>> AMD V2000 hardware, achieves 92% efficiency on AMD W7900 hardware.
>>
>> Testing: Verified with ROCm stack and OpenCL applications in VIRTIO 
>> virtualized
>> environments.
>> - Full OPENCL CTS tests passed on ROCm 5.7.0 in V2000 platform.
>> - Near 70% percentage of OPENCL CTS tests passed on ROCm 7.0 W7900 
>> platform.
>> - most HIP catch tests passed on ROCm 7.0 W7900 platform.
>> - Some AI applications enabled on ROCm 7.0 W7900 platform.
>>
>> V4 changes:
>>      - Renamed VIRTIO_GPU_CAPSET_HSAKMT to VIRTIO_GPU_CAPSET_ROCM
>>      - Remove userptr feature probing cause it can reuse the guest
>>        blob resource code path, reduce patch count from 6 to 5
>>      - Updated corresponding commit messages
>>      - Consolidated userptr feature detection in final patch
>>      - Update corresponding cover letter content
>>
>> V3 changes:
>>      - Split into focused patches for easier review
>>      - Removed complex interval tree userptr management
>>      - Simplified resource creation without deduplication
>>      - Added VIRTGPU_PARAM_RESOURCE_USERPTR for feature detection
>>      - Improved UAPI documentation and error handling
>>      - Enhanced code quality with proper cleanup paths
>>      - Removed MMU notifier dependencies for simplicity
>>      - Fixed resource lifecycle management issues
>>
>> V2: - Split add HSAKMT context and blob userptr resource to two patches.
>>      - Remove MMU notifier related patches, cause use not moveable 
>> user space
>>        memory with MMU notifier is not a good idea.
>>      - Remove HSAKMT context check when create context, let all the 
>> context
>>        support the userptr feature.
>>      - Remove MMU notifier related content in cover letter.
>>      - Add more comments  for patch 6 in cover letter.
>>
>> Honglei Huang (5):
>>    drm/virtio-gpu: Add VIRTIO_GPU_CAPSET_ROCM capability
>>    virtio-gpu api: add blob userptr resource
>>    drm/virtgpu api: add blob userptr resource
>>    drm/virtio: implement userptr support for zero-copy memory access
>>    drm/virtio: advertise base userptr feature to userspace
>>
>>   drivers/gpu/drm/virtio/Makefile          |   3 +-
>>   drivers/gpu/drm/virtio/virtgpu_drv.h     |  33 ++++
>>   drivers/gpu/drm/virtio/virtgpu_ioctl.c   |   9 +-
>>   drivers/gpu/drm/virtio/virtgpu_object.c  |   6 +
>>   drivers/gpu/drm/virtio/virtgpu_userptr.c | 231 +++++++++++++++++++++++
>>   include/uapi/drm/virtgpu_drm.h           |   9 +
>>   include/uapi/linux/virtio_gpu.h          |   7 +
>>   7 files changed, 295 insertions(+), 3 deletions(-)
>>   create mode 100644 drivers/gpu/drm/virtio/virtgpu_userptr.c
>>
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ