[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220209023645.GN4160@nvidia.com>
Date: Tue, 8 Feb 2022 22:36:45 -0400
From: Jason Gunthorpe <jgg@...dia.com>
To: Alex Williamson <alex.williamson@...hat.com>
Cc: Yishai Hadas <yishaih@...dia.com>, bhelgaas@...gle.com,
saeedm@...dia.com, linux-pci@...r.kernel.org, kvm@...r.kernel.org,
netdev@...r.kernel.org, kuba@...nel.org, leonro@...dia.com,
kwankhede@...dia.com, mgurtovoy@...dia.com, maorg@...dia.com,
ashok.raj@...el.com, kevin.tian@...el.com,
shameerali.kolothum.thodi@...wei.com
Subject: Re: [PATCH V7 mlx5-next 08/15] vfio: Define device migration
protocol v2
On Tue, Feb 08, 2022 at 05:07:54PM -0700, Alex Williamson wrote:
> On Mon, 7 Feb 2022 19:22:09 +0200
> Yishai Hadas <yishaih@...dia.com> wrote:
> > +static int
> > +vfio_ioctl_device_feature_mig_device_state(struct vfio_device *device,
> > + u32 flags, void __user *arg,
> > + size_t argsz)
> > +{
> > + size_t minsz =
> > + offsetofend(struct vfio_device_feature_mig_state, data_fd);
> > + struct vfio_device_feature_mig_state mig;
>
> Perhaps set default data_fd here? ie.
>
> struct vfio_device_feature_mig_state mig = { .data_fd = -1 };
Why? there is no path where this variable is read before set.
> > + struct file *filp = NULL;
> > + int ret;
> > +
> > + if (!device->ops->migration_set_state ||
> > + !device->ops->migration_get_state)
> > + return -ENOTTY;
> > +
> > + ret = vfio_check_feature(flags, argsz,
> > + VFIO_DEVICE_FEATURE_SET |
> > + VFIO_DEVICE_FEATURE_GET,
> > + sizeof(mig));
> > + if (ret != 1)
> > + return ret;
> > +
> > + if (copy_from_user(&mig, arg, minsz))
> > + return -EFAULT;
^^^^^^^^^^^^^^
Is before all gotos.
> > +enum vfio_device_mig_state {
> > + VFIO_DEVICE_STATE_ERROR = 0,
> > + VFIO_DEVICE_STATE_STOP = 1,
> > + VFIO_DEVICE_STATE_RUNNING = 2,
>
> I'm a little surprised we're not using RUNNING = 0 given all the
> objection in the v1 protocol that the default state was non-zero.
Making ERROR 0 ensures that errors, eg in the FSM table due to a
backport or something still work properly.
I think we corrected that confusion by explicitly calling out RUNNING
as the default and removing the register-like region API.
> > /* -------- API for Type1 VFIO IOMMU -------- */
> >
> > /**
>
> Otherwise, I'm still not sure how userspace handles the fact that it
> can't know how much data will be read from the device and how important
> that is. There's no replacement of that feature from the v1 protocol
> here.
I'm not sure this was part of the v1 protocol either. Yes it had a
pending_bytes, but I don't think it was actually expected to be 100%
accurate. Computing this value accurately is potentially quite
expensive, I would prefer we not enforce this on an implementation
without a reason, and qemu currently doesn't make use of it.
The ioctl from the precopy patch is probably the best approach, I
think it would be fine to allow that for stop copy as well, but also
don't see a usage right now.
It is not something that needs decision now, it is very easy to detect
if an ioctl is supported on the data_fd at runtime to add new things
here when needed.
> I also think we're still waiting for confirmation from owners of
> devices with extremely large device states (vGPUs) whether they consider
> the stream FD sufficient versus their ability to directly mmap regions
> of the device in the previous protocol. Thanks,
As is this.
I think the mlx5 and huwaei patches show that without a doubt the
stream fd is the correct choice for these drivers.
Jason
Powered by blists - more mailing lists