lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230914191406.54656-1-shannon.nelson@amd.com>
Date:   Thu, 14 Sep 2023 12:14:06 -0700
From:   Shannon Nelson <shannon.nelson@....com>
To:     <alex.williamson@...hat.com>, <jgg@...pe.ca>,
        <kevin.tian@...el.com>, <reinette.chatre@...el.com>,
        <tglx@...utronix.de>, <kvm@...r.kernel.org>
CC:     <shannon.nelson@....com>, <brett.creeley@....com>,
        <linux-kernel@...r.kernel.org>
Subject: [PATCH vfio] vfio/pci: remove msi domain on msi disable

The new MSI dynamic allocation machinery is great for making the irq
management more flexible.  It includes caching information about the
MSI domain which gets reused on each new open of a VFIO fd.  However,
this causes an issue when the underlying hardware has flexible MSI-x
configurations, as a changed configuration doesn't get seen between
new opens, and is only refreshed between PCI unbind/bind cycles.

In our device we can change the per-VF MSI-x resource allocation
without the need for rebooting or function reset.  For example,

  1. Initial power up and kernel boot:
	# lspci -s 2e:00.1 -vv | grep MSI-X
	        Capabilities: [a0] MSI-X: Enable+ Count=8 Masked-

  2. Device VF configuration change happens with no reset

  3. New MSI-x count value seen:
	# lspci -s 2e:00.1 -vv | grep MSI-X
		Capabilities: [a0] MSI-X: Enable- Count=64 Masked-

This allows for per-VF dynamic reconfiguration of interrupt resources
for the VMs using the VFIO devices supported by our hardware.

The problem comes where the dynamic IRQ management created the MSI
domain when the VFIO device creates the first IRQ in the first ioctl()
VFIO_DEVICE_SET_IRQS request.  The current MSI-x count (hwsize) is read
when setting up the irq vectors under pci_alloc_irq_vectors_affinity(),
and the MSI domain information is set up, which includes the hwsize.

When the VFIO fd is closed, the IRQs are removed, but the MSI domain
information is kept for later use since we're only closing the current
VFIO fd, not unbinding the PCI device connection.  When a new VFIO fd
open happens and a new VFIO_DEVICE_SET_IRQS request comes down, the cycle
starts again, reusing the existing MSI domain with the previous hwsize.
This is fine until this new QEMU instance has read the new larger MSI-x
count from PCI config space (QEMU:vfio_msi_enable()) and tries to create
more IRQs than were available before.  We fail in msi_insert_desc()
because the MSI domain still is set up for the earlier hwsize and has no
room for the n+1 IRQ.

This can be easily fixed by simply adding msi_remove_device_irq_domain()
into vfio_msi_disable() which is called when the VFIO IRQs are removed
either by an ioctl() call from QEMU or from the VFIO fd close.  This forces
the MSI domain to be recreated with the new MSI-x count on the next
VFIO_DEVICE_SET_IRQS request.

Link: https://lore.kernel.org/all/cover.1683740667.git.reinette.chatre@intel.com/
Link: https://lore.kernel.org/r/20221124232325.798556374@linutronix.de
Signed-off-by: Shannon Nelson <shannon.nelson@....com>
---
 drivers/vfio/pci/vfio_pci_intrs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index cbb4bcbfbf83..f66d5e7e078b 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -538,6 +538,7 @@ static void vfio_msi_disable(struct vfio_pci_core_device *vdev, bool msix)
 	cmd = vfio_pci_memory_lock_and_enable(vdev);
 	pci_free_irq_vectors(pdev);
 	vfio_pci_memory_unlock_and_restore(vdev, cmd);
+	msi_remove_device_irq_domain(&pdev->dev, MSI_DEFAULT_DOMAIN);
 
 	/*
 	 * Both disable paths above use pci_intx_for_msi() to clear DisINTx
-- 
2.17.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ