linux-kernel - Re: [q&a] Status of IOMMU virtualization for nested virtualization (userspace PCI drivers in VMs)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <D4DDA526-5E5D-40DB-86EF-B4B6D7692663@pjd.dev>
Date: Thu, 29 Feb 2024 11:53:39 -0800
From: Peter Delevoryas <peter@....dev>
Cc: qemu-devel <qemu-devel@...gnu.org>,
 suravee.suthikulpanit@....com,
 iommu@...ts.linux.dev,
 kvm@...r.kernel.org,
 linux-kernel@...r.kernel.org,
 alex.williamson@...hat.com,
 Peter Delevoryas <peter@....dev>
Subject: Re: [q&a] Status of IOMMU virtualization for nested virtualization
 (userspace PCI drivers in VMs)



> On Feb 28, 2024, at 11:38 AM, Alex Williamson <alex.williamson@...hat.com> wrote:
> 
> On Wed, 28 Feb 2024 10:29:32 -0800
> Peter Delevoryas <peter@....dev> wrote:
> 
>> Hey guys,
>> 
>> I’m having a little trouble reading between the lines on various
>> docs, mailing list threads, KVM presentations, github forks, etc, so
>> I figured I’d just ask:
>> 
>> What is the status of IOMMU virtualization, like in the case where I
>> want a VM guest to have a virtual IOMMU?
> 
> It works fine for simply nested assignment scenarios, ie. guest
> userspace drivers or nested VMs.
> 
>> I found this great presentation from KVM Forum 2021: [1]
>> 
>> 1. I’m using -device intel-iommu right now. This has performance
>> implications and large DMA transfers hit the vfio_iommu_type1
>> dma_entry_limit on the host because of how the mappings are made.
> 
> Hugepages for the guest and mappings within the guest should help both
> the mapping performance and DMA entry limit.  In general the type1 vfio
> IOMMU backend is not optimized for dynamic mapping, so performance-wise
> your best bet is still to design the userspace driver for static DMA
> buffers.

Yep, huge pages definitely help, will probably switch to allocating them at boot for better guarantees.

> 
>> 2. -device virtio-iommu is an improvement, but it doesn’t seem
>> compatible with -device vfio-pci? I was only able to test this with
>> cloud-hypervisor, and it has a better vfio mapping pattern (avoids
>> hitting dma_entry_limit).
> 
> AFAIK it's just growing pains, it should work but it's working through
> bugs.

Oh really?? Ok: I might even be configuring things incorrectly, or
Maybe I need to upgrade from QEMU 7.1 to 8. I was relying on whatever
libvirt does by default, which seems to just be:

    -device virtio-iommu -device vfio-pci,host=<bdf>

But maybe I need some other options?

> 
>> 3. -object iommufd [2] I haven’t tried this quite yet, planning to:
>> if it’s using iommufd, and I have all the right kernel features in
>> the guest and host, I assume it’s implementing the passthrough mode
>> that AMD has described in their talk? Because I imagine that would be
>> the best solution for me, I’m just having trouble understanding if
>> it’s actually related or orthogonal.
> 
> For now iommufd provides a similar DMA mapping interface to type1, but
> it does remove the DMA entry limit and improves locked page accounting.
> 
> To really see a performance improvement relative to dynamic mappings,
> you'll need nesting support in the IOMMU, which is under active
> development.  From this aspect you will want iommufd since similar
> features will not be provided by type1.  Thanks,

I see, thanks! That’s great to hear.

> 
> Alex
>