[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211014064824.66c90ee5.alex.williamson@redhat.com>
Date: Thu, 14 Oct 2021 06:48:24 -0600
From: Alex Williamson <alex.williamson@...hat.com>
To: Zhenguo Yao <yaozhenguo1@...il.com>
Cc: bhelgaas@...gle.com, cohuck@...hat.com, jgg@...pe.ca,
mgurtovoy@...dia.com, yishaih@...dia.com, kvm@...r.kernel.org,
linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
yaozhenguo@...com
Subject: Re: [PATCH v1 0/2] Add ablility of VFIO driver to ignore reset when
device don't need it
On Thu, 14 Oct 2021 17:57:46 +0800
Zhenguo Yao <yaozhenguo1@...il.com> wrote:
> In some scenarios, vfio device can't do any reset in initialization
> process. For example: Nvswitch and GPU A100 working in Shared NVSwitch
> Virtualization Model. In such mode, there are two type VMs: service
> VM and Guest VM. The GPU devices are initialized in the following steps:
>
> 1. Service VM boot up. GPUs and Nvswitchs are passthrough to service VM.
> Nvidia driver and manager software will do some settings in service VM.
>
> 2. The selected GPUs are unpluged from service VM.
>
> 3. Guest VM boots up with the selected GPUs passthrough.
>
> The selected GPUs can't do any reset in step3, or they will be initialized
> failed in Guest VM.
>
> This patchset add a PCI sysfs interface:ignore_reset which drivers can
> use it to control whether to do PCI reset or not. For example: In Shared
> NVSwitch Virtualization Model. Hypervisor can disable PCI reset by setting
> ignore_reset to 1 before Gust VM booting up.
>
> Zhenguo Yao (2):
> PCI: Add ignore_reset sysfs interface to control whether do device
> reset in PCI drivers
> vfio-pci: Don't do device reset when ignore_reset is setting
>
> drivers/pci/pci-sysfs.c | 25 +++++++++++++++++
> drivers/vfio/pci/vfio_pci_core.c | 48 ++++++++++++++++++++------------
> include/linux/pci.h | 1 +
> 3 files changed, 56 insertions(+), 18 deletions(-)
>
This all seems like code to mask that these NVSwitch configurations are
probably insecure because we can't factor and manage NVSwitch isolation
into IOMMU grouping. I'm guessing this "service VM" pokes proprietary
registers to manage that isolation and perhaps later resetting devices
negates that programming. A more proper solution is probably to do our
best to guess the span of an NVSwitch configuration and make the IOMMU
group include all the devices, until NVIDIA provides proper code for
the kernel to understand this interconnect and how it affects DMA
isolation. Nak on disabling resets for the purpose of preventing a
user from undoing proprietary device programming. Thanks,
Alex
Powered by blists - more mailing lists