lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <15bf1b00-3aa0-973a-3a86-3fa5c4d41d2c@daynix.com>
Date:   Thu, 13 Apr 2023 13:49:43 +0900
From:   Akihiko Odaki <akihiko.odaki@...nix.com>
To:     Jean-Philippe Brucker <jean-philippe@...aro.org>,
        Eric Auger <eric.auger@...hat.com>
Cc:     virtio-dev@...ts.oasis-open.org,
        virtualization@...ts.linux-foundation.org,
        linux-kernel@...r.kernel.org, qemu-devel@...gnu.org
Subject: virtio-iommu hotplug issue

Hi,

Recently I encountered a problem with the combination of Linux's 
virtio-iommu driver and QEMU when a SR-IOV virtual function gets 
disabled. I'd like to ask you what kind of solution is appropriate here 
and implement the solution if possible.

A PCIe device implementing the SR-IOV specification exports a virtual 
function, and the guest can enable or disable it at runtime by writing 
to a configuration register. This effectively looks like a PCI device is 
hotplugged for the guest. In such a case, the kernel assumes the 
endpoint is detached from the virtio-iommu domain, but QEMU actually 
does not detach it.

This inconsistent view of the removed device sometimes prevents the VM 
from correctly performing the following procedure, for example:
1. Enable a VF.
2. Disable the VF.
3. Open a vfio container.
4. Open the group which the PF belongs to.
5. Add the group to the vfio container.
6. Map some memory region.
7. Close the group.
8. Close the vfio container.
9. Repeat 3-8

When the VF gets disabled, the kernel assumes the endpoint is detached 
from the IOMMU domain, but QEMU actually doesn't detach it. Later, the 
domain will be reused in step 3-8.

In step 7, the PF will be detached, and the kernel thinks there is no 
endpoint attached and the mapping the domain holds is cleared, but the 
VF endpoint is still attached and the mapping is kept intact.

In step 9, the same domain will be reused again, and the kernel requests 
to create a new mapping, but it will conflict with the existing mapping 
and result in -EINVAL.

This problem can be fixed by either of:
- requesting the detachment of the endpoint from the guest when the PCI 
device is unplugged (the VF is disabled)
- detecting that the PCI device is gone and automatically detach it on 
QEMU-side.

It is not completely clear for me which solution is more appropriate as 
the virtio-iommu specification is written in a way independent of the 
endpoint mechanism and does not say what should be done when a PCI 
device is unplugged.

Regards,
Akihiko Odaki

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ