lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cc444faa-af80-4bab-ac3b-a013fef4a695@rsg.ci.i.u-tokyo.ac.jp>
Date: Fri, 16 Jan 2026 17:54:09 +0900
From: Akihiko Odaki <odaki@....ci.i.u-tokyo.ac.jp>
To: Honglei Huang <honghuan@....com>
Cc: Gurchetan Singh <gurchetansingh@...omium.org>,
        Chia-I Wu <olvaffe@...il.com>, dri-devel@...ts.freedesktop.org,
        virtualization@...ts.linux.dev, linux-kernel@...r.kernel.org,
        Honglei Huang <honglei1.huang@....com>,
        David Airlie <airlied@...hat.com>, Ray.Huang@....com,
        Gerd Hoffmann <kraxel@...hat.com>,
        Dmitry Osipenko <dmitry.osipenko@...labora.com>,
        Thomas Zimmermann <tzimmermann@...e.de>,
        Maxime Ripard <mripard@...nel.org>,
        Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
        Simona Vetter <simona@...ll.ch>
Subject: Re: [PATCH v4 0/5] virtio-gpu: Add userptr support for compute
 workloads

On 2026/01/16 16:20, Honglei Huang wrote:
> 
> 
> On 2026/1/15 17:20, Akihiko Odaki wrote:
>> On 2026/01/15 16:58, Honglei Huang wrote:
>>> From: Honglei Huang <honghuan@....com>
>>>
>>> Hello,
>>>
>>> This series adds virtio-gpu userptr support to enable ROCm native
>>> context for compute workloads. The userptr feature allows the host to
>>> directly access guest userspace memory without memcpy overhead, which is
>>> essential for GPU compute performance.
>>>
>>> The userptr implementation provides buffer-based zero-copy memory 
>>> access.
>>> This approach pins guest userspace pages and exposes them to the host
>>> via scatter-gather tables, enabling efficient compute operations.
>>
>> This description looks identical with what 
>> VIRTIO_GPU_BLOB_MEM_HOST3D_GUEST does so there should be some 
>> explanation how it makes difference.
>>
>> I have already pointed out this when reviewing the QEMU patches[1], 
>> but I note that here too, since QEMU is just a middleman and this 
>> matter is better discussed by Linux and virglrenderer developers.
>>
>> [1] https://lore.kernel.org/qemu-devel/35a8add7-da49-4833-9e69- 
>> d213f52c771a@....com/
>>
> 
> Thanks for raising this important point about the distinction between
> VIRTGPU_BLOB_FLAG_USE_USERPTR and VIRTIO_GPU_BLOB_MEM_HOST3D_GUEST.
> I might not have explained it clearly previously.
> 
> The key difference is memory ownership and lifecycle:
> 
> BLOB_MEM_HOST3D_GUEST:
>    - Kernel allocates memory (drm_gem_shmem_create)
>    - Userspace accesses via mmap(GEM_BO)
>    - Use case: Graphics resources (Vulkan/OpenGL)
> 
> BLOB_FLAG_USE_USERPTR:
>    - Userspace pre-allocates memory (malloc/mmap)

"Kernel allocates memory" and "userspace pre-allocates memory" is a bit 
ambiguous phrasing. Either way, the userspace requests the kernel to map 
memory with a system call, brk() or mmap().

>    - Kernel only get existing pages
>    - Use case: Compute workloads (ROCm/CUDA) with large datasets, like
> GPU needs load a big model file 10G+, UMD mmap the fd file, then give 
> the mmap ptr into userspace then driver do not need a another copy.
> But if the shmem is used, the userspace needs copy the file data into a 
> shmem mmap ptr there is a copy overhead.
> 
> Userptr:
> 
> file -> open/mmap -> userspace ptr -> driver
> 
> shmem:
> 
> user alloc shmem ──→ mmap shmem ──→ shmem userspace ptr -> driver
>                                                ↑
>                                                │ copy
>                                                │
> file ──→ open/mmap ──→ file userptr ──────────┘
> 
> 
> For compute workloads, this matters significantly:
>    Without userptr: malloc(8GB) → alloc GEM BO → memcpy 8GB → compute → 
> memcpy 8GB back
>    With userptr:    malloc(8GB) → create userptr BO → compute (zero-copy)

Why don't you alloc GEM BO first and read the file into there?

> 
> The explicit flag serves three purposes:
> 
> 1. Although both send scatter-gather entries to host. The flag makes the 
> intent unambiguous.

Why will the host care?

> 
> 2. Ensures consistency between flag and userptr address field.

Addresses are represented with the nr_entries and following struct 
virtio_gpu_mem_entry entries, whenever 
VIRTIO_GPU_CMD_RESOURCE_CREATE_BLOB or 
VIRTIO_GPU_CMD_RESOURCE_ATTACH_BACKING is used. Having a special flag 
introduces inconsistency.

> 
> 3. Future HMM support: There is a plan to upgrade userptr implementation 
> to use Heterogeneous Memory Management for better GPU coherency and 
> dynamic page migration. The flag provides a clean path to future upgrade.

How will the upgrade path with the flag and the one without the flag 
look like, and in what aspect the upgrade path with the flag is "cleaner"?

> 
> I understand the concern about API complexity. I'll defer to the virtio- 
> gpu maintainers for the final decision on whether this design is 
> acceptable or if they prefer an alternative approach.

It is fine to have API complexity. The problem here is the lack of clear 
motivation and documentation.

Another way to put this is: how will you explain the flag in the virtio 
specification? It should say "the driver MAY/SHOULD/MUST do something" 
and/or "the device MAY/SHOULD/MUST do something", and then Linux and 
virglrenderer can implement the flag accordingly.

Regards,
Akihiko Odaki

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ