[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BN9PR11MB5276AE8019D9D2482C8972958C3B9@BN9PR11MB5276.namprd11.prod.outlook.com>
Date: Tue, 22 Feb 2022 02:00:24 +0000
From: "Tian, Kevin" <kevin.tian@...el.com>
To: Yishai Hadas <yishaih@...dia.com>,
"alex.williamson@...hat.com" <alex.williamson@...hat.com>,
"bhelgaas@...gle.com" <bhelgaas@...gle.com>,
"jgg@...dia.com" <jgg@...dia.com>,
"saeedm@...dia.com" <saeedm@...dia.com>
CC: "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"kuba@...nel.org" <kuba@...nel.org>,
"leonro@...dia.com" <leonro@...dia.com>,
"kwankhede@...dia.com" <kwankhede@...dia.com>,
"mgurtovoy@...dia.com" <mgurtovoy@...dia.com>,
"maorg@...dia.com" <maorg@...dia.com>,
"cohuck@...hat.com" <cohuck@...hat.com>,
"Raj, Ashok" <ashok.raj@...el.com>,
"shameerali.kolothum.thodi@...wei.com"
<shameerali.kolothum.thodi@...wei.com>
Subject: RE: [PATCH V8 mlx5-next 10/15] vfio: Extend the device migration
protocol with RUNNING_P2P
> From: Yishai Hadas <yishaih@...dia.com>
> Sent: Sunday, February 20, 2022 5:57 PM
>
> From: Jason Gunthorpe <jgg@...dia.com>
>
> The RUNNING_P2P state is designed to support multiple devices in the same
> VM that are doing P2P transactions between themselves. When in
> RUNNING_P2P
> the device must be able to accept incoming P2P transactions but should not
> generate outgoing P2P transactions.
>
> As an optional extension to the mandatory states it is defined as
> inbetween STOP and RUNNING:
> STOP -> RUNNING_P2P -> RUNNING -> RUNNING_P2P -> STOP
>
> For drivers that are unable to support RUNNING_P2P the core code
> silently merges RUNNING_P2P and RUNNING together. Unless driver support
> is present, the new state cannot be used in SET_STATE.
> Drivers that support this will be required to implement 4 FSM arcs
> beyond the basic FSM. 2 of the basic FSM arcs become combination
> transitions.
>
> Compared to the v1 clarification, NDMA is redefined into FSM states and is
> described in terms of the desired P2P quiescent behavior, noting that
> halting all DMA is an acceptable implementation.
>
> Signed-off-by: Jason Gunthorpe <jgg@...dia.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@...wei.com>
> Signed-off-by: Yishai Hadas <yishaih@...dia.com>
Reviewed-by: Kevin Tian <kevin.tian@...el.com>
> ---
> drivers/vfio/vfio.c | 84 +++++++++++++++++++++++++++++++--------
> include/linux/vfio.h | 1 +
> include/uapi/linux/vfio.h | 36 ++++++++++++++++-
> 3 files changed, 102 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index b37ab27b511f..bdb5205bb358 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1577,39 +1577,55 @@ int vfio_mig_get_next_state(struct vfio_device
> *device,
> enum vfio_device_mig_state new_fsm,
> enum vfio_device_mig_state *next_fsm)
> {
> - enum { VFIO_DEVICE_NUM_STATES =
> VFIO_DEVICE_STATE_RESUMING + 1 };
> + enum { VFIO_DEVICE_NUM_STATES =
> VFIO_DEVICE_STATE_RUNNING_P2P + 1 };
> /*
> - * The coding in this table requires the driver to implement 6
> + * The coding in this table requires the driver to implement
> * FSM arcs:
> * RESUMING -> STOP
> - * RUNNING -> STOP
> * STOP -> RESUMING
> - * STOP -> RUNNING
> * STOP -> STOP_COPY
> * STOP_COPY -> STOP
> *
> - * The coding will step through multiple states for these combination
> - * transitions:
> - * RESUMING -> STOP -> RUNNING
> + * If P2P is supported then the driver must also implement these FSM
> + * arcs:
> + * RUNNING -> RUNNING_P2P
> + * RUNNING_P2P -> RUNNING
> + * RUNNING_P2P -> STOP
> + * STOP -> RUNNING_P2P
> + * Without P2P the driver must implement:
> + * RUNNING -> STOP
> + * STOP -> RUNNING
> + *
> + * If all optional features are supported then the coding will step
> + * through multiple states for these combination transitions:
> + * RESUMING -> STOP -> RUNNING_P2P
> + * RESUMING -> STOP -> RUNNING_P2P -> RUNNING
> * RESUMING -> STOP -> STOP_COPY
> - * RUNNING -> STOP -> RESUMING
> - * RUNNING -> STOP -> STOP_COPY
> + * RUNNING -> RUNNING_P2P -> STOP
> + * RUNNING -> RUNNING_P2P -> STOP -> RESUMING
> + * RUNNING -> RUNNING_P2P -> STOP -> STOP_COPY
> + * RUNNING_P2P -> STOP -> RESUMING
> + * RUNNING_P2P -> STOP -> STOP_COPY
> + * STOP -> RUNNING_P2P -> RUNNING
> * STOP_COPY -> STOP -> RESUMING
> - * STOP_COPY -> STOP -> RUNNING
> + * STOP_COPY -> STOP -> RUNNING_P2P
> + * STOP_COPY -> STOP -> RUNNING_P2P -> RUNNING
> */
> static const u8
> vfio_from_fsm_table[VFIO_DEVICE_NUM_STATES][VFIO_DEVICE_NUM_STA
> TES] = {
> [VFIO_DEVICE_STATE_STOP] = {
> [VFIO_DEVICE_STATE_STOP] =
> VFIO_DEVICE_STATE_STOP,
> - [VFIO_DEVICE_STATE_RUNNING] =
> VFIO_DEVICE_STATE_RUNNING,
> + [VFIO_DEVICE_STATE_RUNNING] =
> VFIO_DEVICE_STATE_RUNNING_P2P,
> [VFIO_DEVICE_STATE_STOP_COPY] =
> VFIO_DEVICE_STATE_STOP_COPY,
> [VFIO_DEVICE_STATE_RESUMING] =
> VFIO_DEVICE_STATE_RESUMING,
> + [VFIO_DEVICE_STATE_RUNNING_P2P] =
> VFIO_DEVICE_STATE_RUNNING_P2P,
> [VFIO_DEVICE_STATE_ERROR] =
> VFIO_DEVICE_STATE_ERROR,
> },
> [VFIO_DEVICE_STATE_RUNNING] = {
> - [VFIO_DEVICE_STATE_STOP] =
> VFIO_DEVICE_STATE_STOP,
> + [VFIO_DEVICE_STATE_STOP] =
> VFIO_DEVICE_STATE_RUNNING_P2P,
> [VFIO_DEVICE_STATE_RUNNING] =
> VFIO_DEVICE_STATE_RUNNING,
> - [VFIO_DEVICE_STATE_STOP_COPY] =
> VFIO_DEVICE_STATE_STOP,
> - [VFIO_DEVICE_STATE_RESUMING] =
> VFIO_DEVICE_STATE_STOP,
> + [VFIO_DEVICE_STATE_STOP_COPY] =
> VFIO_DEVICE_STATE_RUNNING_P2P,
> + [VFIO_DEVICE_STATE_RESUMING] =
> VFIO_DEVICE_STATE_RUNNING_P2P,
> + [VFIO_DEVICE_STATE_RUNNING_P2P] =
> VFIO_DEVICE_STATE_RUNNING_P2P,
> [VFIO_DEVICE_STATE_ERROR] =
> VFIO_DEVICE_STATE_ERROR,
> },
> [VFIO_DEVICE_STATE_STOP_COPY] = {
> @@ -1617,6 +1633,7 @@ int vfio_mig_get_next_state(struct vfio_device
> *device,
> [VFIO_DEVICE_STATE_RUNNING] =
> VFIO_DEVICE_STATE_STOP,
> [VFIO_DEVICE_STATE_STOP_COPY] =
> VFIO_DEVICE_STATE_STOP_COPY,
> [VFIO_DEVICE_STATE_RESUMING] =
> VFIO_DEVICE_STATE_STOP,
> + [VFIO_DEVICE_STATE_RUNNING_P2P] =
> VFIO_DEVICE_STATE_STOP,
> [VFIO_DEVICE_STATE_ERROR] =
> VFIO_DEVICE_STATE_ERROR,
> },
> [VFIO_DEVICE_STATE_RESUMING] = {
> @@ -1624,6 +1641,15 @@ int vfio_mig_get_next_state(struct vfio_device
> *device,
> [VFIO_DEVICE_STATE_RUNNING] =
> VFIO_DEVICE_STATE_STOP,
> [VFIO_DEVICE_STATE_STOP_COPY] =
> VFIO_DEVICE_STATE_STOP,
> [VFIO_DEVICE_STATE_RESUMING] =
> VFIO_DEVICE_STATE_RESUMING,
> + [VFIO_DEVICE_STATE_RUNNING_P2P] =
> VFIO_DEVICE_STATE_STOP,
> + [VFIO_DEVICE_STATE_ERROR] =
> VFIO_DEVICE_STATE_ERROR,
> + },
> + [VFIO_DEVICE_STATE_RUNNING_P2P] = {
> + [VFIO_DEVICE_STATE_STOP] =
> VFIO_DEVICE_STATE_STOP,
> + [VFIO_DEVICE_STATE_RUNNING] =
> VFIO_DEVICE_STATE_RUNNING,
> + [VFIO_DEVICE_STATE_STOP_COPY] =
> VFIO_DEVICE_STATE_STOP,
> + [VFIO_DEVICE_STATE_RESUMING] =
> VFIO_DEVICE_STATE_STOP,
> + [VFIO_DEVICE_STATE_RUNNING_P2P] =
> VFIO_DEVICE_STATE_RUNNING_P2P,
> [VFIO_DEVICE_STATE_ERROR] =
> VFIO_DEVICE_STATE_ERROR,
> },
> [VFIO_DEVICE_STATE_ERROR] = {
> @@ -1631,17 +1657,41 @@ int vfio_mig_get_next_state(struct vfio_device
> *device,
> [VFIO_DEVICE_STATE_RUNNING] =
> VFIO_DEVICE_STATE_ERROR,
> [VFIO_DEVICE_STATE_STOP_COPY] =
> VFIO_DEVICE_STATE_ERROR,
> [VFIO_DEVICE_STATE_RESUMING] =
> VFIO_DEVICE_STATE_ERROR,
> + [VFIO_DEVICE_STATE_RUNNING_P2P] =
> VFIO_DEVICE_STATE_ERROR,
> [VFIO_DEVICE_STATE_ERROR] =
> VFIO_DEVICE_STATE_ERROR,
> },
> };
>
> - if (WARN_ON(cur_fsm >= ARRAY_SIZE(vfio_from_fsm_table)))
> + static const unsigned int
> state_flags_table[VFIO_DEVICE_NUM_STATES] = {
> + [VFIO_DEVICE_STATE_STOP] =
> VFIO_MIGRATION_STOP_COPY,
> + [VFIO_DEVICE_STATE_RUNNING] =
> VFIO_MIGRATION_STOP_COPY,
> + [VFIO_DEVICE_STATE_STOP_COPY] =
> VFIO_MIGRATION_STOP_COPY,
> + [VFIO_DEVICE_STATE_RESUMING] =
> VFIO_MIGRATION_STOP_COPY,
> + [VFIO_DEVICE_STATE_RUNNING_P2P] =
> + VFIO_MIGRATION_STOP_COPY |
> VFIO_MIGRATION_P2P,
> + [VFIO_DEVICE_STATE_ERROR] = ~0U,
> + };
> +
> + if (WARN_ON(cur_fsm >= ARRAY_SIZE(vfio_from_fsm_table) ||
> + (state_flags_table[cur_fsm] & device->migration_flags) !=
> + state_flags_table[cur_fsm]))
> return -EINVAL;
>
> - if (new_fsm >= ARRAY_SIZE(vfio_from_fsm_table))
> + if (new_fsm >= ARRAY_SIZE(vfio_from_fsm_table) ||
> + (state_flags_table[new_fsm] & device->migration_flags) !=
> + state_flags_table[new_fsm])
> return -EINVAL;
>
> + /*
> + * Arcs touching optional and unsupported states are skipped over.
> The
> + * driver will instead see an arc from the original state to the next
> + * logical state, as per the above comment.
> + */
> *next_fsm = vfio_from_fsm_table[cur_fsm][new_fsm];
> + while ((state_flags_table[*next_fsm] & device->migration_flags) !=
> + state_flags_table[*next_fsm])
> + *next_fsm = vfio_from_fsm_table[*next_fsm][new_fsm];
> +
> return (*next_fsm != VFIO_DEVICE_STATE_ERROR) ? 0 : -EINVAL;
> }
> EXPORT_SYMBOL_GPL(vfio_mig_get_next_state);
> @@ -1731,7 +1781,7 @@ static int
> vfio_ioctl_device_feature_migration(struct vfio_device *device,
> size_t argsz)
> {
> struct vfio_device_feature_migration mig = {
> - .flags = VFIO_MIGRATION_STOP_COPY,
> + .flags = device->migration_flags,
> };
> int ret;
>
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 3bbadcdbc9c8..3176cb5d4464 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -33,6 +33,7 @@ struct vfio_device {
> struct vfio_group *group;
> struct vfio_device_set *dev_set;
> struct list_head dev_set_list;
> + unsigned int migration_flags;
>
> /* Members below here are private, not for driver use */
> refcount_t refcount;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 02b836ea8f46..46b06946f0a8 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -1010,10 +1010,16 @@ struct vfio_device_feature {
> *
> * VFIO_MIGRATION_STOP_COPY means that STOP, STOP_COPY and
> * RESUMING are supported.
> + *
> + * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P means that
> RUNNING_P2P
> + * is supported in addition to the STOP_COPY states.
> + *
> + * Other combinations of flags have behavior to be defined in the future.
> */
> struct vfio_device_feature_migration {
> __aligned_u64 flags;
> #define VFIO_MIGRATION_STOP_COPY (1 << 0)
> +#define VFIO_MIGRATION_P2P (1 << 1)
> };
> #define VFIO_DEVICE_FEATURE_MIGRATION 1
>
> @@ -1064,10 +1070,13 @@ struct vfio_device_feature_mig_state {
> * RESUMING - The device is stopped and is loading a new internal state
> * ERROR - The device has failed and must be reset
> *
> + * And 1 optional state to support VFIO_MIGRATION_P2P:
> + * RUNNING_P2P - RUNNING, except the device cannot do peer to peer
> DMA
> + *
> * The FSM takes actions on the arcs between FSM states. The driver
> implements
> * the following behavior for the FSM arcs:
> *
> - * RUNNING -> STOP
> + * RUNNING_P2P -> STOP
> * STOP_COPY -> STOP
> * While in STOP the device must stop the operation of the device. The
> device
> * must not generate interrupts, DMA, or any other change to external state.
> @@ -1094,11 +1103,16 @@ struct vfio_device_feature_mig_state {
> *
> * To abort a RESUMING session the device must be reset.
> *
> - * STOP -> RUNNING
> + * RUNNING_P2P -> RUNNING
> * While in RUNNING the device is fully operational, the device may
> generate
> * interrupts, DMA, respond to MMIO, all vfio device regions are functional,
> * and the device may advance its internal state.
> *
> + * RUNNING -> RUNNING_P2P
> + * STOP -> RUNNING_P2P
> + * While in RUNNING_P2P the device is partially running in the P2P
> quiescent
> + * state defined below.
> + *
> * STOP -> STOP_COPY
> * This arc begin the process of saving the device state and will return a
> * new data_fd.
> @@ -1128,6 +1142,18 @@ struct vfio_device_feature_mig_state {
> * To recover from ERROR VFIO_DEVICE_RESET must be used to return the
> * device_state back to RUNNING.
> *
> + * The optional peer to peer (P2P) quiescent state is intended to be a
> quiescent
> + * state for the device for the purposes of managing multiple devices within
> a
> + * user context where peer-to-peer DMA between devices may be active.
> The
> + * RUNNING_P2P states must prevent the device from initiating
> + * any new P2P DMA transactions. If the device can identify P2P transactions
> + * then it can stop only P2P DMA, otherwise it must stop all DMA. The
> migration
> + * driver must complete any such outstanding operations prior to
> completing the
> + * FSM arc into a P2P state. For the purpose of specification the states
> + * behave as though the device was fully running if not supported. Like
> while in
> + * STOP or STOP_COPY the user must not touch the device, otherwise the
> state
> + * can be exited.
> + *
> * The remaining possible transitions are interpreted as combinations of the
> * above FSM arcs. As there are multiple paths through the FSM arcs the
> path
> * should be selected based on the following rules:
> @@ -1140,6 +1166,11 @@ struct vfio_device_feature_mig_state {
> * fails. When handling these types of errors users should anticipate future
> * revisions of this protocol using new states and those states becoming
> * visible in this case.
> + *
> + * The optional states cannot be used with SET_STATE if the device does not
> + * support them. The user can discover if these states are supported by
> using
> + * VFIO_DEVICE_FEATURE_MIGRATION. By using combination transitions
> the user can
> + * avoid knowing about these optional states if the kernel driver supports
> them.
> */
> enum vfio_device_mig_state {
> VFIO_DEVICE_STATE_ERROR = 0,
> @@ -1147,6 +1178,7 @@ enum vfio_device_mig_state {
> VFIO_DEVICE_STATE_RUNNING = 2,
> VFIO_DEVICE_STATE_STOP_COPY = 3,
> VFIO_DEVICE_STATE_RESUMING = 4,
> + VFIO_DEVICE_STATE_RUNNING_P2P = 5,
> };
>
> /* -------- API for Type1 VFIO IOMMU -------- */
> --
> 2.18.1
Powered by blists - more mailing lists