[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b69439ec-0ebd-4527-873b-85b283e03888@rsg.ci.i.u-tokyo.ac.jp>
Date: Fri, 16 Jan 2026 22:22:22 +0900
From: Akihiko Odaki <odaki@....ci.i.u-tokyo.ac.jp>
To: Honglei Huang <honghuan@....com>
Cc: Gurchetan Singh <gurchetansingh@...omium.org>,
Chia-I Wu <olvaffe@...il.com>, dri-devel@...ts.freedesktop.org,
virtualization@...ts.linux.dev, linux-kernel@...r.kernel.org,
Honglei Huang <honglei1.huang@....com>,
David Airlie <airlied@...hat.com>, Ray.Huang@....com,
Gerd Hoffmann <kraxel@...hat.com>,
Dmitry Osipenko <dmitry.osipenko@...labora.com>,
Thomas Zimmermann <tzimmermann@...e.de>,
Maxime Ripard <mripard@...nel.org>,
Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
Simona Vetter <simona@...ll.ch>
Subject: Re: [PATCH v4 0/5] virtio-gpu: Add userptr support for compute
workloads
On 2026/01/16 21:34, Honglei Huang wrote:
>
>
> On 2026/1/16 19:03, Akihiko Odaki wrote:
>> On 2026/01/16 19:32, Honglei Huang wrote:
>>>
>>>
>>> On 2026/1/16 18:01, Akihiko Odaki wrote:
>>>> On 2026/01/16 18:39, Honglei Huang wrote:
>>>>>
>>>>>
>>>>> On 2026/1/16 16:54, Akihiko Odaki wrote:
>>>>>> On 2026/01/16 16:20, Honglei Huang wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2026/1/15 17:20, Akihiko Odaki wrote:
>>>>>>>> On 2026/01/15 16:58, Honglei Huang wrote:
>>>>>>>>> From: Honglei Huang <honghuan@....com>
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> This series adds virtio-gpu userptr support to enable ROCm native
>>>>>>>>> context for compute workloads. The userptr feature allows the
>>>>>>>>> host to
>>>>>>>>> directly access guest userspace memory without memcpy overhead,
>>>>>>>>> which is
>>>>>>>>> essential for GPU compute performance.
>>>>>>>>>
>>>>>>>>> The userptr implementation provides buffer-based zero-copy
>>>>>>>>> memory access.
>>>>>>>>> This approach pins guest userspace pages and exposes them to
>>>>>>>>> the host
>>>>>>>>> via scatter-gather tables, enabling efficient compute operations.
>>>>>>>>
>>>>>>>> This description looks identical with what
>>>>>>>> VIRTIO_GPU_BLOB_MEM_HOST3D_GUEST does so there should be some
>>>>>>>> explanation how it makes difference.
>>>>>>>>
>>>>>>>> I have already pointed out this when reviewing the QEMU
>>>>>>>> patches[1], but I note that here too, since QEMU is just a
>>>>>>>> middleman and this matter is better discussed by Linux and
>>>>>>>> virglrenderer developers.
>>>>>>>>
>>>>>>>> [1] https://lore.kernel.org/qemu-devel/35a8add7-da49-4833-9e69-
>>>>>>>> d213f52c771a@....com/
>>>>>>>>
>>>>>>>
>>>>>>> Thanks for raising this important point about the distinction
>>>>>>> between
>>>>>>> VIRTGPU_BLOB_FLAG_USE_USERPTR and VIRTIO_GPU_BLOB_MEM_HOST3D_GUEST.
>>>>>>> I might not have explained it clearly previously.
>>>>>>>
>>>>>>> The key difference is memory ownership and lifecycle:
>>>>>>>
>>>>>>> BLOB_MEM_HOST3D_GUEST:
>>>>>>> - Kernel allocates memory (drm_gem_shmem_create)
>>>>>>> - Userspace accesses via mmap(GEM_BO)
>>>>>>> - Use case: Graphics resources (Vulkan/OpenGL)
>>>>>>>
>>>>>>> BLOB_FLAG_USE_USERPTR:
>>>>>>> - Userspace pre-allocates memory (malloc/mmap)
>>>>>>
>>>>>> "Kernel allocates memory" and "userspace pre-allocates memory" is
>>>>>> a bit ambiguous phrasing. Either way, the userspace requests the
>>>>>> kernel to map memory with a system call, brk() or mmap().
>>>>>
>>>>> They are different:
>>>>> BLOB_MEM_HOST3D_GUEST (kernel-managed pages):
>>>>> - Allocated via drm_gem_shmem_create() as GFP_KERNEL pages
>>>>> - Kernel guarantees pages won't swap or migrate while GEM object
>>>>> exists
>>>>> - Physical addresses remain stable → safe for DMA
>>>>>
>>>>> BLOB_FLAG_USE_USERPTR (userspace pages):
>>>>> - From regular malloc/mmap - subject to MM policies
>>>>> - Can be swapped, migrated, or compacted by kernel
>>>>> - Requires FOLL_LONGTERM pinning to make DMA-safe
>>>>>
>>>>> The device must treat them differently. Kernel-managed pages have
>>>>> stable physical
>>>>> addresses. Userspace pages need explicit pinning and the device
>>>>> must be prepared
>>>>> for potential invalidation.
>>>>>
>>>>> This is why all compute drivers (amdgpu, i915, nouveau) implement
>>>>> userptr - to
>>>>> make arbitrary userspace allocations DMA-accessible while
>>>>> respecting their different
>>>>> page mobility characteristics.
>>>>> And the drm already has a better frame work for it: SVM, and this
>>>>> verions is a super simplified verion.
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
>>>>> tree/ drivers/gpu/drm/
>>>>> drm_gpusvm.c#:~:text=*%20GPU%20Shared%20Virtual%20Memory%20(GPU%20SVM)%20layer%20for%20the%20Direct%20Rendering%20Manager%20(DRM)
>>>>
>>>> I referred to phrasing "kernel allocates" vs "userspace allocates".
>>>> Using GFP_KERNEL, swapping, migrating, or pinning is all what the
>>>> kernel does.
>>>
>>> I am talking about the virtio gpu driver side, the virtio gpu driver
>>> need handle those two type memory differently.
>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>> - Kernel only get existing pages
>>>>>>> - Use case: Compute workloads (ROCm/CUDA) with large datasets,
>>>>>>> like
>>>>>>> GPU needs load a big model file 10G+, UMD mmap the fd file, then
>>>>>>> give the mmap ptr into userspace then driver do not need a
>>>>>>> another copy.
>>>>>>> But if the shmem is used, the userspace needs copy the file data
>>>>>>> into a shmem mmap ptr there is a copy overhead.
>>>>>>>
>>>>>>> Userptr:
>>>>>>>
>>>>>>> file -> open/mmap -> userspace ptr -> driver
>>>>>>>
>>>>>>> shmem:
>>>>>>>
>>>>>>> user alloc shmem ──→ mmap shmem ──→ shmem userspace ptr -> driver
>>>>>>> ↑
>>>>>>> │ copy
>>>>>>> │
>>>>>>> file ──→ open/mmap ──→ file userptr ──────────┘
>>>>>>>
>>>>>>>
>>>>>>> For compute workloads, this matters significantly:
>>>>>>> Without userptr: malloc(8GB) → alloc GEM BO → memcpy 8GB →
>>>>>>> compute → memcpy 8GB back
>>>>>>> With userptr: malloc(8GB) → create userptr BO → compute
>>>>>>> (zero- copy)
>>>>>>
>>>>>> Why don't you alloc GEM BO first and read the file into there?
>>>>>
>>>>> Because that defeats the purpose of zero-copy.
>>>>>
>>>>> With GEM-BO-first (what you suggest):
>>>>>
>>>>> void *gembo = virtgpu_gem_create(10GB); // Allocate GEM buffer
>>>>> void *model = mmap(..., model_file_fd, 0); // Map model file
>>>>> memcpy(gembo, model, 10GB); // Copy 10GB - NOT
>>>>> zero- copy
>>>>> munmap(model, 10GB);
>>>>> gpu_compute(gembo);
>>>>>
>>>>> Result: 10GB copy overhead + double memory usage during copy.
>>>>
>>>> How about:
>>>>
>>>> void *gembo = virtgpu_gem_create(10GB);
>>>> read(model_file_fd, gembo, 10GB);
>>>
>>> I believe there is still memory copy in read operation
>>> model_file_fd -> gembo, they have different physical pages,
>>> but the userptr/SVM feature will access the model_file_fd physical
>>> pages directly.
>>
>> You can use O_DIRECT if you want.
>>
>>>
>>>
>>>>
>>>> Result: zero-copy + simpler code.
>>>>
>>>>>
>>>>> With userptr (zero-copy):
>>>>>
>>>>> void *model = mmap(..., model_file_fd, 0); // Map model file
>>>>> hsa_memory_register(model, 10GB); // Pin pages, create
>>>>> userptr BO
>>>>> gpu_compute(model); // GPU reads directly
>>>>> from file pages
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> The explicit flag serves three purposes:
>>>>>>>
>>>>>>> 1. Although both send scatter-gather entries to host. The flag
>>>>>>> makes the intent unambiguous.
>>>>>>
>>>>>> Why will the host care?
>>>>>
>>>>> The flag tells host this is a userptr, host side need handle it
>>>>> specially.
>>>>
>>>> Please provide the concrete requirement. What is the special
>>>> handling the host side needs to perform?
>>>
>>> Every hardware has it own special API to handle userptr, for amdgpu ROCm
>>> it is hsaKmtRegisterMemoryWithFlags.
>>
>> On the host side, BLOB_MEM_HOST3D_GUEST will always result in a
>> userspace pointer. Below is how the address is translated:
>>
>> 1) (with the ioctl you are adding)
>> Guest kernel translates guest userspace pointer to guest PA.
>> 2) (with IOMMU)
>> Guest kernel translates guest PA to device VA
>> 3) The host VMM translates device VA to host userspace pointer
>> 4) virglrenderer passes userspace pointer to the GPU API (ROCm)
>>
>> BLOB_FLAG_USE_USERPTR tells 1) happened. But the succeeding process is
>> not affected by that.
>>
>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> 2. Ensures consistency between flag and userptr address field.
>>>>>>
>>>>>> Addresses are represented with the nr_entries and following struct
>>>>>> virtio_gpu_mem_entry entries, whenever
>>>>>> VIRTIO_GPU_CMD_RESOURCE_CREATE_BLOB or
>>>>>> VIRTIO_GPU_CMD_RESOURCE_ATTACH_BACKING is used. Having a special
>>>>>> flag introduces inconsistency.
>>>>>
>>>>> For this part I am talking about the virito gpu guest UMD side, in
>>>>> blob create io ctrl we need this flag to
>>>>> check the userptr address and is it a read-only attribute:
>>>>> if (rc_blob->blob_flags & VIRTGPU_BLOB_FLAG_USE_USERPTR) {
>>>>> if (!rc_blob->userptr)
>>>>> return -EINVAL;
>>>>> } else {
>>>>> if (rc_blob->userptr)
>>>>> return -EINVAL;
>>>>>
>>>>> if (rc_blob->blob_flags & VIRTGPU_BLOB_FLAG_USERPTR_RDONLY)
>>>>> return -EINVAL;
>>>>> }
>>>>
>>>> I see. That shows VIRTGPU_BLOB_FLAG_USE_USERPTR is necessary for the
>>>> ioctl.
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> 3. Future HMM support: There is a plan to upgrade userptr
>>>>>>> implementation to use Heterogeneous Memory Management for better
>>>>>>> GPU coherency and dynamic page migration. The flag provides a
>>>>>>> clean path to future upgrade.
>>>>>>
>>>>>> How will the upgrade path with the flag and the one without the
>>>>>> flag look like, and in what aspect the upgrade path with the flag
>>>>>> is "cleaner"?
>>>>>
>>>>> As I mentioned above the userptr handling is different with shmem/
>>>>> GEM BO.
>>>>
>>>> All the above describes the guest-internal behavior. What about the
>>>> interaction between the guest and host? How will virtio as a guest-
>>>> host interface having VIRTIO_GPU_BLOB_FLAG_USE_USERPTR ease future
>>>> upgrade?
>>>
>>> It depends on how we implement it, the current version is the
>>> simplest implementation, similar to the implementation in Intel's i915.
>>> If virtio side needs HMM to implement a SVM type userptr feature
>>> I think VIRTIO_GPU_BLOB_FLAG_USE_USERPTR is must needed, stack needs
>>> to know if it is a userptr resource, and to perform advanced
>>> operations such as updating page tables, splitting BOs, etc.
>>
>> Why do the device need to know if it is a userptr resource to perform
>> operations when the device always get device VAs?
>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> I understand the concern about API complexity. I'll defer to the
>>>>>>> virtio- gpu maintainers for the final decision on whether this
>>>>>>> design is acceptable or if they prefer an alternative approach.
>>>>>>
>>>>>> It is fine to have API complexity. The problem here is the lack of
>>>>>> clear motivation and documentation.
>>>>>>
>>>>>> Another way to put this is: how will you explain the flag in the
>>>>>> virtio specification? It should say "the driver MAY/SHOULD/MUST do
>>>>>> something" and/or "the device MAY/SHOULD/MUST do something", and
>>>>>> then Linux and virglrenderer can implement the flag accordingly.
>>>>>
>>>>> you're absolutely right that the specification should
>>>>> be written in proper virtio spec language. The draft should be:
>>>>>
>>>>> VIRTIO_GPU_BLOB_FLAG_USE_USERPTR:
>>>>>
>>>>> Linux virtio driver requirements:
>>>>> - MUST set userptr to valid guest userspace VA in
>>>>> drm_virtgpu_resource_create_blob
>>>>> - SHOULD keep VA mapping valid until resource destruction
>>>>> - MUST pin pages or use HMM at blob creation time
>>>>
>>>> These descriptions are not for the virtio specification. The virtio
>>>> specification describes the interaction between the driver and
>>>> device. These statements describe the interaction between the guest
>>>> userspace and the guest kernel.
>>>>
>>>>>
>>>>> Virglrenderer requirements:
>>>>> - must use correspoonding API for userptr resource
>>>>
>>>> What is the "corresponding API"?
>>>
>>> It may can be:
>>> **VIRTIO_GPU_BLOB_FLAG_USE_USERPTR specification:**
>>>
>>> Driver requirements:
>>> - MUST populate mem_entry[] with valid guest physical addresses of
>>> pinned userspace pages
>>
>> "Userspace" is a the guest-internal concepts and irrelevant with the
>> interaction between the driver and device.
>>
>>> - MUST set blob_mem to VIRTIO_GPU_BLOB_FLAG_USE_USERPTR when using
>>> this flag
>>
>> When should the driver use the flag?
>>
>>> - SHOULD keep pages pinned until VIRTIO_GPU_CMD_RESOURCE_UNREF
>>
>> It is not a new requirement. The page must stay at the same position
>> whether VIRTIO_GPU_BLOB_FLAG_USE_USERPTR is used or not.
>>
>>>
>>> Device requirements:
>>> - MUST establish IOMMU mappings using the provided iovec array with
>>> specific API.(hsaKmtRegisterMemoryWithFlags for ROCm)
>>
>> This should be also true even when VIRTIO_GPU_BLOB_FLAG_USE_USERPTR is
>> not set.
>>
>>>
>>>
>>>
>>> Really thanks for your comments, and I believe we need some input of
>>> virito gpu maintainers.
>>>
>>> VIRTIO_GPU_BLOB_FLAG_USE_USERPTR flag is a flag for how to use, and
>>> it doen't conflict with VIRTGPU_BLOB_MEM_HOST3D_GUEST. Just like a
>>> resource is used for VIRTGPU_BLOB_FLAG_USE_SHAREABLE but it can be a
>>> guest resource or a host resource.
>>>
>>> If we don't have VIRTIO_GPU_BLOB_FLAG_USE_USERPTR flag, we may have some
>>> resource conflict in host side, guest kernel can use 'userptr' param
>>> to identify. But in host side the 'userptr' param is lost, we only
>>> know it is just a guest flag resource.
>>
>> I still don't see why knowing it is a guest resource is insufficient
>> for the host.
>
> All right, I totally agreed with you.
>
> And let virtio gpu maintainer/drm decide how to design the flag/params
> maybe is better.
>
>
> I believe the core gap between you and me is the concept of userptr/SVM.
> What does userptr/SVM used for, it let GPU and CPU share the userspace
> virtual address. Perhaps my description is not accurate enough.
That is not what your QEMU patch series does; QEMU sees an address space
bound to the virtio-gpu device which is not the guest userspace virtual
address space.
Below is my points in the discussion:
- Zero copy is not a new thing, but virtio already has features for
that: VIRTIO_GPU_BLOB_MEM_GUEST and VIRTIO_GPU_BLOB_MEM_HOST3D_GUEST.
- You *always* need hsaKmtRegisterMemoryWithFlags() or similar when
implementing VIRTIO_GPU_BLOB_MEM_GUEST and/or
VIRTIO_GPU_BLOB_MEM_HOST3D_GUEST, so having another flag does not make
any difference.
- The guest userspace virtual address is never exposed to the host in
your QEMU patch series in contrary to your description.
Regards,
Akihiko Odaki
Powered by blists - more mailing lists