[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2ae03f22-740d-4a48-b5f3-114eef92fb29@amd.com>
Date: Fri, 16 Jan 2026 15:20:36 +0800
From: Honglei Huang <honghuan@....com>
To: Akihiko Odaki <odaki@....ci.i.u-tokyo.ac.jp>
Cc: Gurchetan Singh <gurchetansingh@...omium.org>,
Chia-I Wu <olvaffe@...il.com>, dri-devel@...ts.freedesktop.org,
virtualization@...ts.linux.dev, linux-kernel@...r.kernel.org,
Honglei Huang <honglei1.huang@....com>, David Airlie <airlied@...hat.com>,
Ray.Huang@....com, Gerd Hoffmann <kraxel@...hat.com>,
Dmitry Osipenko <dmitry.osipenko@...labora.com>,
Thomas Zimmermann <tzimmermann@...e.de>, Maxime Ripard <mripard@...nel.org>,
Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
Simona Vetter <simona@...ll.ch>
Subject: Re: [PATCH v4 0/5] virtio-gpu: Add userptr support for compute
workloads
On 2026/1/15 17:20, Akihiko Odaki wrote:
> On 2026/01/15 16:58, Honglei Huang wrote:
>> From: Honglei Huang <honghuan@....com>
>>
>> Hello,
>>
>> This series adds virtio-gpu userptr support to enable ROCm native
>> context for compute workloads. The userptr feature allows the host to
>> directly access guest userspace memory without memcpy overhead, which is
>> essential for GPU compute performance.
>>
>> The userptr implementation provides buffer-based zero-copy memory access.
>> This approach pins guest userspace pages and exposes them to the host
>> via scatter-gather tables, enabling efficient compute operations.
>
> This description looks identical with what
> VIRTIO_GPU_BLOB_MEM_HOST3D_GUEST does so there should be some
> explanation how it makes difference.
>
> I have already pointed out this when reviewing the QEMU patches[1], but
> I note that here too, since QEMU is just a middleman and this matter is
> better discussed by Linux and virglrenderer developers.
>
> [1] https://lore.kernel.org/qemu-devel/35a8add7-da49-4833-9e69-
> d213f52c771a@....com/
>
Thanks for raising this important point about the distinction between
VIRTGPU_BLOB_FLAG_USE_USERPTR and VIRTIO_GPU_BLOB_MEM_HOST3D_GUEST.
I might not have explained it clearly previously.
The key difference is memory ownership and lifecycle:
BLOB_MEM_HOST3D_GUEST:
- Kernel allocates memory (drm_gem_shmem_create)
- Userspace accesses via mmap(GEM_BO)
- Use case: Graphics resources (Vulkan/OpenGL)
BLOB_FLAG_USE_USERPTR:
- Userspace pre-allocates memory (malloc/mmap)
- Kernel only get existing pages
- Use case: Compute workloads (ROCm/CUDA) with large datasets, like
GPU needs load a big model file 10G+, UMD mmap the fd file, then give
the mmap ptr into userspace then driver do not need a another copy.
But if the shmem is used, the userspace needs copy the file data into a
shmem mmap ptr there is a copy overhead.
Userptr:
file -> open/mmap -> userspace ptr -> driver
shmem:
user alloc shmem ──→ mmap shmem ──→ shmem userspace ptr -> driver
↑
│ copy
│
file ──→ open/mmap ──→ file userptr ──────────┘
For compute workloads, this matters significantly:
Without userptr: malloc(8GB) → alloc GEM BO → memcpy 8GB → compute →
memcpy 8GB back
With userptr: malloc(8GB) → create userptr BO → compute (zero-copy)
The explicit flag serves three purposes:
1. Although both send scatter-gather entries to host. The flag makes the
intent unambiguous.
2. Ensures consistency between flag and userptr address field.
3. Future HMM support: There is a plan to upgrade userptr implementation
to use Heterogeneous Memory Management for better GPU coherency and
dynamic page migration. The flag provides a clean path to future upgrade.
I understand the concern about API complexity. I'll defer to the
virtio-gpu maintainers for the final decision on whether this design is
acceptable or if they prefer an alternative approach.
Regards,
Honglei Huang
>>
>> Key features:
>> - Zero-copy memory access between guest userspace and host GPU
>> - Read-only and read-write userptr support
>> - Runtime feature detection via VIRTGPU_PARAM_RESOURCE_USERPTR
>> - ROCm capset support for ROCm stack integration
>> - Proper page lifecycle management with FOLL_LONGTERM pinning
>>
>> Patches overview:
>> 1. Add VIRTIO_GPU_CAPSET_ROCM capability for compute workloads
>> 2. Add virtio-gpu API definitions for userptr blob resources
>> 3. Extend DRM UAPI with comprehensive userptr support
>> 4. Implement core userptr functionality with page management
>> 5. Integrate userptr into blob resource creation and advertise to
>> userspace
>>
>> Performance: In popular compute benchmarks, this implementation achieves
>> approximately 70% efficiency compared to bare metal OpenCL performance on
>> AMD V2000 hardware, achieves 92% efficiency on AMD W7900 hardware.
>>
>> Testing: Verified with ROCm stack and OpenCL applications in VIRTIO
>> virtualized
>> environments.
>> - Full OPENCL CTS tests passed on ROCm 5.7.0 in V2000 platform.
>> - Near 70% percentage of OPENCL CTS tests passed on ROCm 7.0 W7900
>> platform.
>> - most HIP catch tests passed on ROCm 7.0 W7900 platform.
>> - Some AI applications enabled on ROCm 7.0 W7900 platform.
>>
>> V4 changes:
>> - Renamed VIRTIO_GPU_CAPSET_HSAKMT to VIRTIO_GPU_CAPSET_ROCM
>> - Remove userptr feature probing cause it can reuse the guest
>> blob resource code path, reduce patch count from 6 to 5
>> - Updated corresponding commit messages
>> - Consolidated userptr feature detection in final patch
>> - Update corresponding cover letter content
>>
>> V3 changes:
>> - Split into focused patches for easier review
>> - Removed complex interval tree userptr management
>> - Simplified resource creation without deduplication
>> - Added VIRTGPU_PARAM_RESOURCE_USERPTR for feature detection
>> - Improved UAPI documentation and error handling
>> - Enhanced code quality with proper cleanup paths
>> - Removed MMU notifier dependencies for simplicity
>> - Fixed resource lifecycle management issues
>>
>> V2: - Split add HSAKMT context and blob userptr resource to two patches.
>> - Remove MMU notifier related patches, cause use not moveable
>> user space
>> memory with MMU notifier is not a good idea.
>> - Remove HSAKMT context check when create context, let all the
>> context
>> support the userptr feature.
>> - Remove MMU notifier related content in cover letter.
>> - Add more comments for patch 6 in cover letter.
>>
>> Honglei Huang (5):
>> drm/virtio-gpu: Add VIRTIO_GPU_CAPSET_ROCM capability
>> virtio-gpu api: add blob userptr resource
>> drm/virtgpu api: add blob userptr resource
>> drm/virtio: implement userptr support for zero-copy memory access
>> drm/virtio: advertise base userptr feature to userspace
>>
>> drivers/gpu/drm/virtio/Makefile | 3 +-
>> drivers/gpu/drm/virtio/virtgpu_drv.h | 33 ++++
>> drivers/gpu/drm/virtio/virtgpu_ioctl.c | 9 +-
>> drivers/gpu/drm/virtio/virtgpu_object.c | 6 +
>> drivers/gpu/drm/virtio/virtgpu_userptr.c | 231 +++++++++++++++++++++++
>> include/uapi/drm/virtgpu_drm.h | 9 +
>> include/uapi/linux/virtio_gpu.h | 7 +
>> 7 files changed, 295 insertions(+), 3 deletions(-)
>> create mode 100644 drivers/gpu/drm/virtio/virtgpu_userptr.c
>>
>
Powered by blists - more mailing lists