[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220215153343.GA1046125@nvidia.com>
Date: Tue, 15 Feb 2022 11:33:43 -0400
From: Jason Gunthorpe <jgg@...dia.com>
To: "Tian, Kevin" <kevin.tian@...el.com>
Cc: Yishai Hadas <yishaih@...dia.com>,
"alex.williamson@...hat.com" <alex.williamson@...hat.com>,
"bhelgaas@...gle.com" <bhelgaas@...gle.com>,
"saeedm@...dia.com" <saeedm@...dia.com>,
"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"kuba@...nel.org" <kuba@...nel.org>,
"leonro@...dia.com" <leonro@...dia.com>,
"kwankhede@...dia.com" <kwankhede@...dia.com>,
"mgurtovoy@...dia.com" <mgurtovoy@...dia.com>,
"maorg@...dia.com" <maorg@...dia.com>,
"Raj, Ashok" <ashok.raj@...el.com>,
"shameerali.kolothum.thodi@...wei.com"
<shameerali.kolothum.thodi@...wei.com>
Subject: Re: [PATCH V7 mlx5-next 08/15] vfio: Define device migration
protocol v2
On Tue, Feb 15, 2022 at 08:04:27AM +0000, Tian, Kevin wrote:
> > From: Yishai Hadas <yishaih@...dia.com>
> > Sent: Tuesday, February 8, 2022 1:22 AM
> >
> > +static int vfio_ioctl_device_feature_migration(struct vfio_device *device,
> > + u32 flags, void __user *arg,
> > + size_t argsz)
> > +{
> > + struct vfio_device_feature_migration mig = {
> > + .flags = VFIO_MIGRATION_STOP_COPY,
> > + };
> > + int ret;
> > +
> > + if (!device->ops->migration_set_state)
> > + return -ENOTTY;
>
> Miss a check on migration_get_state, as done in last function.
Yep
> > + * @migration_set_state: Optional callback to change the migration state for
> > + * devices that support migration. The returned FD is used for data
> > + * transfer according to the FSM definition. The driver is responsible
> > + * to ensure that FD is isolated whenever the migration FSM leaves a
> > + * data transfer state or before close_device() returns.
>
> didn't understand the meaning of 'isolated' here.
It is not a good word. Lets say 'that FD reaches end of stream or
error whenever'
> > +#define VFIO_DEVICE_STATE_V1_STOP (0)
> > +#define VFIO_DEVICE_STATE_V1_RUNNING (1 << 0)
> > +#define VFIO_DEVICE_STATE_V1_SAVING (1 << 1)
> > +#define VFIO_DEVICE_STATE_V1_RESUMING (1 << 2)
> > +#define VFIO_DEVICE_STATE_MASK (VFIO_DEVICE_STATE_V1_RUNNING
> > | \
> > + VFIO_DEVICE_STATE_V1_SAVING | \
> > + VFIO_DEVICE_STATE_V1_RESUMING)
>
> Does it make sense to also add 'V1' to MASK and also following macros
> given their names are general?
No, the point of this exercise is to avoid trouble for qemu - the
fewest changes we can get away with the better.
Once qemu is updated we'll delete this old stuff from the kernel.
> > +/*
> > + * Indicates the device can support the migration API. See enum
>
> call it V2? Not necessary to add V2 in code but worthy of a clarification
> in comment.
We've only called it 'v2' for discussions.
If you think it is unclear lets say 'support the migration API through
VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE'
>
> > + * vfio_device_mig_state for details. If present flags must be non-zero and
> > + * VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE is supported.
> > + *
> > + * VFIO_MIGRATION_STOP_COPY means that RUNNING, STOP, STOP_COPY
> > and
> > + * RESUMING are supported.
> > + */
>
> Not aligned with other places where 5 states are mentioned. Better add
> ERROR here.
ERROR is not a state that is 'supported'. It could be clarified that
ERROR and RUNNING are always supported.
>
> > + *
> > + * RUNNING -> STOP
> > + * STOP_COPY -> STOP
> > + * While in STOP the device must stop the operation of the device. The
> > + * device must not generate interrupts, DMA, or advance its internal
> > + * state. When stopped the device and kernel migration driver must accept
> > + * and respond to interaction to support external subsystems in the STOP
> > + * state, for example PCI MSI-X and PCI config pace. Failure by the user to
> > + * restrict device access while in STOP must not result in error conditions
> > + * outside the user context (ex. host system faults).
>
> Right above the STOP state is defined as:
>
> * STOP - The device does not change the internal or external state
>
> 'external state' I assume means P2P activities. For consistency it is clearer
> to also say something about external state in above paragraph.
No, STOP is defined to halt all DMA. I tidied it a bit like this:
* While in STOP the device must stop the operation of the device. The device
* must not generate interrupts, DMA, or any other change to external state.
* It must not change its internal state.
> > + *
> > + * The STOP_COPY arc will terminate a data transfer session.
>
> remove 'will'
will is correct grammar. It could be 'arc terminates'
> > + *
> > + * STOP -> STOP_COPY
> > + * This arc begin the process of saving the device state and will return a
> > + * new data_fd.
> > + *
> > + * While in the STOP_COPY state the device has the same behavior as STOP
> > + * with the addition that the data transfers session continues to stream the
> > + * migration state. End of stream on the FD indicates the entire device
> > + * state has been transferred.
> > + *
> > + * The user should take steps to restrict access to vfio device regions while
> > + * the device is in STOP_COPY or risk corruption of the device migration
> > data
> > + * stream.
>
> Restricting access has been explained in the to-STOP arcs and it is stated
> that while in STOP_COPY the device has the same behavior as STOP. So
> I think the last paragraph is possibly not required.
It is not the same, the language in STOP is saying that the device
must tolerate external touches without breaking the kernel
This language is saying if external touches happen then the device is
free to corrupt the migration stream.
In both cases we expect good userspace to not have device
touches, the guidance here is for driver authors about what kind of
steps they need to take to protect against hostile userspace.
> > + * STOP -> RESUMING
> > + * Entering the RESUMING state starts a process of restoring the device
> > + * state and will return a new data_fd. The data stream fed into the
> > data_fd
> > + * should be taken from the data transfer output of the saving group states
>
> No definition of 'group state' (maybe introduced in a later patch?)
Yes, it was added in the P2P patch
We can avoid talking about saving group here entirely, it really just
means a single FD.
* The data stream fed into the data_fd should
* be taken from the data transfer output of a single FD during saving on a
* from a compatible device.
Thanks,
Jason
Powered by blists - more mailing lists