[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211108123547.GS2744544@nvidia.com>
Date: Mon, 8 Nov 2021 08:35:47 -0400
From: Jason Gunthorpe <jgg@...dia.com>
To: "Tian, Kevin" <kevin.tian@...el.com>
Cc: Alex Williamson <alex.williamson@...hat.com>,
Cornelia Huck <cohuck@...hat.com>,
Yishai Hadas <yishaih@...dia.com>,
"bhelgaas@...gle.com" <bhelgaas@...gle.com>,
"saeedm@...dia.com" <saeedm@...dia.com>,
"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"kuba@...nel.org" <kuba@...nel.org>,
"leonro@...dia.com" <leonro@...dia.com>,
"kwankhede@...dia.com" <kwankhede@...dia.com>,
"mgurtovoy@...dia.com" <mgurtovoy@...dia.com>,
"maorg@...dia.com" <maorg@...dia.com>,
"Dr. David Alan Gilbert" <dgilbert@...hat.com>
Subject: Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver
for mlx5 devices
On Mon, Nov 08, 2021 at 08:53:20AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@...dia.com>
> > Sent: Tuesday, October 26, 2021 11:19 PM
> >
> > On Tue, Oct 26, 2021 at 08:42:12AM -0600, Alex Williamson wrote:
> >
> > > > This is also why I don't like it being so transparent as it is
> > > > something userspace needs to care about - especially if the HW cannot
> > > > support such a thing, if we intend to allow that.
> > >
> > > Userspace does need to care, but userspace's concern over this should
> > > not be able to compromise the platform and therefore making VF
> > > assignment more susceptible to fatal error conditions to comply with a
> > > migration uAPI is troublesome for me.
> >
> > It is an interesting scenario.
> >
> > I think it points that we are not implementing this fully properly.
> >
> > The !RUNNING state should be like your reset efforts.
> >
> > All access to the MMIO memories from userspace should be revoked
> > during !RUNNING
>
> This assumes that vCPUs must be stopped before !RUNNING is entered
> in virtualization case. and it is true today.
>
> But it may not hold when talking about guest SVA and I/O page fault [1].
> The problem is that the pending requests may trigger I/O page faults
> on guest page tables. W/o running vCPUs to handle those faults, the
> quiesce command cannot complete draining the pending requests
> if the device doesn't support preempt-on-fault (at least it's the case for
> some Intel and Huawei devices, possibly true for most initial SVA
> implementations).
It cannot be ordered any other way.
vCPUs must be stopped first, then the PCI devices must be stopped
after, otherwise the vCPU can touch a stopped a device while handling
a fault which is unreasonable.
However, migrating a pending IOMMU fault does seem unreasonable as well.
The NDA state can potentially solve this:
RUNNING | VCPU RUNNING - Normal
NDMA | RUNNING | VCPU RUNNING - Halt and flush DMA, and thus all faults
NDMA | RUNNING - Halt all MMIO access
0 - Halted everything
Though this may be more disruptive to the vCPUs as they could spin on
DMA/interrupts that will not come.
Jason
Powered by blists - more mailing lists