[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211101172506.GC2744544@nvidia.com>
Date: Mon, 1 Nov 2021 14:25:06 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Alex Williamson <alex.williamson@...hat.com>,
Shameerali Kolothum Thodi
<shameerali.kolothum.thodi@...wei.com>
Cc: Cornelia Huck <cohuck@...hat.com>,
Yishai Hadas <yishaih@...dia.com>, bhelgaas@...gle.com,
saeedm@...dia.com, linux-pci@...r.kernel.org, kvm@...r.kernel.org,
netdev@...r.kernel.org, kuba@...nel.org, leonro@...dia.com,
kwankhede@...dia.com, mgurtovoy@...dia.com, maorg@...dia.com,
"Dr. David Alan Gilbert" <dgilbert@...hat.com>
Subject: Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver
for mlx5 devices
On Fri, Oct 29, 2021 at 04:06:21PM -0600, Alex Williamson wrote:
> > Right now we are focused on the non-P2P cases, which I think is a
> > reasonable starting limitation.
>
> It's a reasonable starting point iff we know that we need to support
> devices that cannot themselves support a quiescent state. Otherwise it
> would make sense to go back to work on the uAPI because I suspect the
> implications to userspace are not going to be as simple as "oops, can't
> migrate, there are two devices." As you say, there's a universe of
> devices that run together that don't care about p2p and QEMU will be
> pressured to support migration of those configurations.
I agree with this, but I also think what I saw in the proposed hns
driver suggests it's HW cannot do quiescent, if so this is the first
counter-example to the notion it is a universal ability?
hns people: Can you put your device in a state where it is operating,
able to accept and respond to MMIO, and yet guarentees it generates no
DMA transactions?
> want migration. If we ever want both migration and p2p, QEMU would
> need to reject any device that can't comply.
Yes, it looks like a complicated task on the qemu side to get this
resolved
> > It is not a big deal to defer things to rc1, though merging a
> > leaf-driver that has been on-list over a month is certainly not
> > rushing either.
>
> If "on-list over a month" is meant to imply that it's well vetted, it
> does not. That's a pretty quick time frame given the uAPI viability
> discussions that it's generated.
I only said rushed :)
> I'm tending to agree that there's value in moving forward, but there's
> a lot we're defining here that's not in the uAPI, so I'd like to see
> those things become formalized.
Ok, lets come up with a documentation patch then to define !RUNNING as
I outlined and start to come up with the allowed list of actions..
I think I would like to have a proper rst file for documenting the
uapi as well.
> I think this version is defining that it's the user's responsibility to
> prevent external DMA to devices while in the !_RUNNING state. This
> resolves the condition that we have no means to coordinate quiescing
> multiple devices. We shouldn't necessarily prescribe a single device
> solution in the uAPI if the same can be equally achieved through
> configuration of DMA mapping.
I'm not sure what this means?
> I was almost on board with blocking MMIO, especially as p2p is just DMA
> mapping of MMIO, but what about MSI-X? During _RESUME we must access
> the MSI-X vector table via the SET_IRQS ioctl to configure interrupts.
> Is this exempt because the access occurs in the host?
s/in the host/in the kernel/ SET_IRQS is a kernel ioctl that uses the
core MSIX code to do the mmio, so it would not be impacted by MMIO
zap.
Looks like you've already marked these points with the
vfio_pci_memory_lock_and_enable(), so a zap for migration would have
to be a little different than a zap for reset.
Still, this is something that needs clear definition, I would expect
the SET_IRQS to happen after resuming clears but before running sets
to give maximum HW flexibility and symmetry with saving.
And we should really define clearly what a device is supposed to do
with the interrupt vectors during migration. Obviously there are races
here.
> In any case, it requires that the device cannot be absolutely static
> while !_RUNNING. Does (_RESUMING) have different rules than
> (_SAVING)?
I'd prever to avoid all device touches during both resuming and
saving, and do them during !RUNNING
> So I'm still unclear how the uAPI needs to be updated relative to
> region access. We need that list of what the user is allowed to
> access, which seems like minimally config space and MSI-X table space,
> but are these implicitly known for vfio-pci devices or do we need
> region flags or capabilities to describe? We can't generally know the
> disposition of device specific regions relative to this access. Thanks,
I'd prefer to be general and have the spec forbid
everything. Specifying things like VFIO_DEVICE_SET_IRQS1 covers all the
bus types.
Other bus types should get spec updates before any other bus type
driver is merged.
Thanks,
Jason
Powered by blists - more mailing lists