lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BN9PR11MB5276AE8019D9D2482C8972958C3B9@BN9PR11MB5276.namprd11.prod.outlook.com>
Date:   Tue, 22 Feb 2022 02:00:24 +0000
From:   "Tian, Kevin" <kevin.tian@...el.com>
To:     Yishai Hadas <yishaih@...dia.com>,
        "alex.williamson@...hat.com" <alex.williamson@...hat.com>,
        "bhelgaas@...gle.com" <bhelgaas@...gle.com>,
        "jgg@...dia.com" <jgg@...dia.com>,
        "saeedm@...dia.com" <saeedm@...dia.com>
CC:     "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "kuba@...nel.org" <kuba@...nel.org>,
        "leonro@...dia.com" <leonro@...dia.com>,
        "kwankhede@...dia.com" <kwankhede@...dia.com>,
        "mgurtovoy@...dia.com" <mgurtovoy@...dia.com>,
        "maorg@...dia.com" <maorg@...dia.com>,
        "cohuck@...hat.com" <cohuck@...hat.com>,
        "Raj, Ashok" <ashok.raj@...el.com>,
        "shameerali.kolothum.thodi@...wei.com" 
        <shameerali.kolothum.thodi@...wei.com>
Subject: RE: [PATCH V8 mlx5-next 10/15] vfio: Extend the device migration
 protocol with RUNNING_P2P

> From: Yishai Hadas <yishaih@...dia.com>
> Sent: Sunday, February 20, 2022 5:57 PM
> 
> From: Jason Gunthorpe <jgg@...dia.com>
> 
> The RUNNING_P2P state is designed to support multiple devices in the same
> VM that are doing P2P transactions between themselves. When in
> RUNNING_P2P
> the device must be able to accept incoming P2P transactions but should not
> generate outgoing P2P transactions.
> 
> As an optional extension to the mandatory states it is defined as
> inbetween STOP and RUNNING:
>    STOP -> RUNNING_P2P -> RUNNING -> RUNNING_P2P -> STOP
> 
> For drivers that are unable to support RUNNING_P2P the core code
> silently merges RUNNING_P2P and RUNNING together. Unless driver support
> is present, the new state cannot be used in SET_STATE.
> Drivers that support this will be required to implement 4 FSM arcs
> beyond the basic FSM. 2 of the basic FSM arcs become combination
> transitions.
> 
> Compared to the v1 clarification, NDMA is redefined into FSM states and is
> described in terms of the desired P2P quiescent behavior, noting that
> halting all DMA is an acceptable implementation.
> 
> Signed-off-by: Jason Gunthorpe <jgg@...dia.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@...wei.com>
> Signed-off-by: Yishai Hadas <yishaih@...dia.com>

Reviewed-by: Kevin Tian <kevin.tian@...el.com>

> ---
>  drivers/vfio/vfio.c       | 84 +++++++++++++++++++++++++++++++--------
>  include/linux/vfio.h      |  1 +
>  include/uapi/linux/vfio.h | 36 ++++++++++++++++-
>  3 files changed, 102 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index b37ab27b511f..bdb5205bb358 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1577,39 +1577,55 @@ int vfio_mig_get_next_state(struct vfio_device
> *device,
>  			    enum vfio_device_mig_state new_fsm,
>  			    enum vfio_device_mig_state *next_fsm)
>  {
> -	enum { VFIO_DEVICE_NUM_STATES =
> VFIO_DEVICE_STATE_RESUMING + 1 };
> +	enum { VFIO_DEVICE_NUM_STATES =
> VFIO_DEVICE_STATE_RUNNING_P2P + 1 };
>  	/*
> -	 * The coding in this table requires the driver to implement 6
> +	 * The coding in this table requires the driver to implement
>  	 * FSM arcs:
>  	 *         RESUMING -> STOP
> -	 *         RUNNING -> STOP
>  	 *         STOP -> RESUMING
> -	 *         STOP -> RUNNING
>  	 *         STOP -> STOP_COPY
>  	 *         STOP_COPY -> STOP
>  	 *
> -	 * The coding will step through multiple states for these combination
> -	 * transitions:
> -	 *         RESUMING -> STOP -> RUNNING
> +	 * If P2P is supported then the driver must also implement these FSM
> +	 * arcs:
> +	 *         RUNNING -> RUNNING_P2P
> +	 *         RUNNING_P2P -> RUNNING
> +	 *         RUNNING_P2P -> STOP
> +	 *         STOP -> RUNNING_P2P
> +	 * Without P2P the driver must implement:
> +	 *         RUNNING -> STOP
> +	 *         STOP -> RUNNING
> +	 *
> +	 * If all optional features are supported then the coding will step
> +	 * through multiple states for these combination transitions:
> +	 *         RESUMING -> STOP -> RUNNING_P2P
> +	 *         RESUMING -> STOP -> RUNNING_P2P -> RUNNING
>  	 *         RESUMING -> STOP -> STOP_COPY
> -	 *         RUNNING -> STOP -> RESUMING
> -	 *         RUNNING -> STOP -> STOP_COPY
> +	 *         RUNNING -> RUNNING_P2P -> STOP
> +	 *         RUNNING -> RUNNING_P2P -> STOP -> RESUMING
> +	 *         RUNNING -> RUNNING_P2P -> STOP -> STOP_COPY
> +	 *         RUNNING_P2P -> STOP -> RESUMING
> +	 *         RUNNING_P2P -> STOP -> STOP_COPY
> +	 *         STOP -> RUNNING_P2P -> RUNNING
>  	 *         STOP_COPY -> STOP -> RESUMING
> -	 *         STOP_COPY -> STOP -> RUNNING
> +	 *         STOP_COPY -> STOP -> RUNNING_P2P
> +	 *         STOP_COPY -> STOP -> RUNNING_P2P -> RUNNING
>  	 */
>  	static const u8
> vfio_from_fsm_table[VFIO_DEVICE_NUM_STATES][VFIO_DEVICE_NUM_STA
> TES] = {
>  		[VFIO_DEVICE_STATE_STOP] = {
>  			[VFIO_DEVICE_STATE_STOP] =
> VFIO_DEVICE_STATE_STOP,
> -			[VFIO_DEVICE_STATE_RUNNING] =
> VFIO_DEVICE_STATE_RUNNING,
> +			[VFIO_DEVICE_STATE_RUNNING] =
> VFIO_DEVICE_STATE_RUNNING_P2P,
>  			[VFIO_DEVICE_STATE_STOP_COPY] =
> VFIO_DEVICE_STATE_STOP_COPY,
>  			[VFIO_DEVICE_STATE_RESUMING] =
> VFIO_DEVICE_STATE_RESUMING,
> +			[VFIO_DEVICE_STATE_RUNNING_P2P] =
> VFIO_DEVICE_STATE_RUNNING_P2P,
>  			[VFIO_DEVICE_STATE_ERROR] =
> VFIO_DEVICE_STATE_ERROR,
>  		},
>  		[VFIO_DEVICE_STATE_RUNNING] = {
> -			[VFIO_DEVICE_STATE_STOP] =
> VFIO_DEVICE_STATE_STOP,
> +			[VFIO_DEVICE_STATE_STOP] =
> VFIO_DEVICE_STATE_RUNNING_P2P,
>  			[VFIO_DEVICE_STATE_RUNNING] =
> VFIO_DEVICE_STATE_RUNNING,
> -			[VFIO_DEVICE_STATE_STOP_COPY] =
> VFIO_DEVICE_STATE_STOP,
> -			[VFIO_DEVICE_STATE_RESUMING] =
> VFIO_DEVICE_STATE_STOP,
> +			[VFIO_DEVICE_STATE_STOP_COPY] =
> VFIO_DEVICE_STATE_RUNNING_P2P,
> +			[VFIO_DEVICE_STATE_RESUMING] =
> VFIO_DEVICE_STATE_RUNNING_P2P,
> +			[VFIO_DEVICE_STATE_RUNNING_P2P] =
> VFIO_DEVICE_STATE_RUNNING_P2P,
>  			[VFIO_DEVICE_STATE_ERROR] =
> VFIO_DEVICE_STATE_ERROR,
>  		},
>  		[VFIO_DEVICE_STATE_STOP_COPY] = {
> @@ -1617,6 +1633,7 @@ int vfio_mig_get_next_state(struct vfio_device
> *device,
>  			[VFIO_DEVICE_STATE_RUNNING] =
> VFIO_DEVICE_STATE_STOP,
>  			[VFIO_DEVICE_STATE_STOP_COPY] =
> VFIO_DEVICE_STATE_STOP_COPY,
>  			[VFIO_DEVICE_STATE_RESUMING] =
> VFIO_DEVICE_STATE_STOP,
> +			[VFIO_DEVICE_STATE_RUNNING_P2P] =
> VFIO_DEVICE_STATE_STOP,
>  			[VFIO_DEVICE_STATE_ERROR] =
> VFIO_DEVICE_STATE_ERROR,
>  		},
>  		[VFIO_DEVICE_STATE_RESUMING] = {
> @@ -1624,6 +1641,15 @@ int vfio_mig_get_next_state(struct vfio_device
> *device,
>  			[VFIO_DEVICE_STATE_RUNNING] =
> VFIO_DEVICE_STATE_STOP,
>  			[VFIO_DEVICE_STATE_STOP_COPY] =
> VFIO_DEVICE_STATE_STOP,
>  			[VFIO_DEVICE_STATE_RESUMING] =
> VFIO_DEVICE_STATE_RESUMING,
> +			[VFIO_DEVICE_STATE_RUNNING_P2P] =
> VFIO_DEVICE_STATE_STOP,
> +			[VFIO_DEVICE_STATE_ERROR] =
> VFIO_DEVICE_STATE_ERROR,
> +		},
> +		[VFIO_DEVICE_STATE_RUNNING_P2P] = {
> +			[VFIO_DEVICE_STATE_STOP] =
> VFIO_DEVICE_STATE_STOP,
> +			[VFIO_DEVICE_STATE_RUNNING] =
> VFIO_DEVICE_STATE_RUNNING,
> +			[VFIO_DEVICE_STATE_STOP_COPY] =
> VFIO_DEVICE_STATE_STOP,
> +			[VFIO_DEVICE_STATE_RESUMING] =
> VFIO_DEVICE_STATE_STOP,
> +			[VFIO_DEVICE_STATE_RUNNING_P2P] =
> VFIO_DEVICE_STATE_RUNNING_P2P,
>  			[VFIO_DEVICE_STATE_ERROR] =
> VFIO_DEVICE_STATE_ERROR,
>  		},
>  		[VFIO_DEVICE_STATE_ERROR] = {
> @@ -1631,17 +1657,41 @@ int vfio_mig_get_next_state(struct vfio_device
> *device,
>  			[VFIO_DEVICE_STATE_RUNNING] =
> VFIO_DEVICE_STATE_ERROR,
>  			[VFIO_DEVICE_STATE_STOP_COPY] =
> VFIO_DEVICE_STATE_ERROR,
>  			[VFIO_DEVICE_STATE_RESUMING] =
> VFIO_DEVICE_STATE_ERROR,
> +			[VFIO_DEVICE_STATE_RUNNING_P2P] =
> VFIO_DEVICE_STATE_ERROR,
>  			[VFIO_DEVICE_STATE_ERROR] =
> VFIO_DEVICE_STATE_ERROR,
>  		},
>  	};
> 
> -	if (WARN_ON(cur_fsm >= ARRAY_SIZE(vfio_from_fsm_table)))
> +	static const unsigned int
> state_flags_table[VFIO_DEVICE_NUM_STATES] = {
> +		[VFIO_DEVICE_STATE_STOP] =
> VFIO_MIGRATION_STOP_COPY,
> +		[VFIO_DEVICE_STATE_RUNNING] =
> VFIO_MIGRATION_STOP_COPY,
> +		[VFIO_DEVICE_STATE_STOP_COPY] =
> VFIO_MIGRATION_STOP_COPY,
> +		[VFIO_DEVICE_STATE_RESUMING] =
> VFIO_MIGRATION_STOP_COPY,
> +		[VFIO_DEVICE_STATE_RUNNING_P2P] =
> +			VFIO_MIGRATION_STOP_COPY |
> VFIO_MIGRATION_P2P,
> +		[VFIO_DEVICE_STATE_ERROR] = ~0U,
> +	};
> +
> +	if (WARN_ON(cur_fsm >= ARRAY_SIZE(vfio_from_fsm_table) ||
> +		    (state_flags_table[cur_fsm] & device->migration_flags) !=
> +			state_flags_table[cur_fsm]))
>  		return -EINVAL;
> 
> -	if (new_fsm >= ARRAY_SIZE(vfio_from_fsm_table))
> +	if (new_fsm >= ARRAY_SIZE(vfio_from_fsm_table) ||
> +	   (state_flags_table[new_fsm] & device->migration_flags) !=
> +			state_flags_table[new_fsm])
>  		return -EINVAL;
> 
> +	/*
> +	 * Arcs touching optional and unsupported states are skipped over.
> The
> +	 * driver will instead see an arc from the original state to the next
> +	 * logical state, as per the above comment.
> +	 */
>  	*next_fsm = vfio_from_fsm_table[cur_fsm][new_fsm];
> +	while ((state_flags_table[*next_fsm] & device->migration_flags) !=
> +			state_flags_table[*next_fsm])
> +		*next_fsm = vfio_from_fsm_table[*next_fsm][new_fsm];
> +
>  	return (*next_fsm != VFIO_DEVICE_STATE_ERROR) ? 0 : -EINVAL;
>  }
>  EXPORT_SYMBOL_GPL(vfio_mig_get_next_state);
> @@ -1731,7 +1781,7 @@ static int
> vfio_ioctl_device_feature_migration(struct vfio_device *device,
>  					       size_t argsz)
>  {
>  	struct vfio_device_feature_migration mig = {
> -		.flags = VFIO_MIGRATION_STOP_COPY,
> +		.flags = device->migration_flags,
>  	};
>  	int ret;
> 
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 3bbadcdbc9c8..3176cb5d4464 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -33,6 +33,7 @@ struct vfio_device {
>  	struct vfio_group *group;
>  	struct vfio_device_set *dev_set;
>  	struct list_head dev_set_list;
> +	unsigned int migration_flags;
> 
>  	/* Members below here are private, not for driver use */
>  	refcount_t refcount;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 02b836ea8f46..46b06946f0a8 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -1010,10 +1010,16 @@ struct vfio_device_feature {
>   *
>   * VFIO_MIGRATION_STOP_COPY means that STOP, STOP_COPY and
>   * RESUMING are supported.
> + *
> + * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P means that
> RUNNING_P2P
> + * is supported in addition to the STOP_COPY states.
> + *
> + * Other combinations of flags have behavior to be defined in the future.
>   */
>  struct vfio_device_feature_migration {
>  	__aligned_u64 flags;
>  #define VFIO_MIGRATION_STOP_COPY	(1 << 0)
> +#define VFIO_MIGRATION_P2P		(1 << 1)
>  };
>  #define VFIO_DEVICE_FEATURE_MIGRATION 1
> 
> @@ -1064,10 +1070,13 @@ struct vfio_device_feature_mig_state {
>   *  RESUMING - The device is stopped and is loading a new internal state
>   *  ERROR - The device has failed and must be reset
>   *
> + * And 1 optional state to support VFIO_MIGRATION_P2P:
> + *  RUNNING_P2P - RUNNING, except the device cannot do peer to peer
> DMA
> + *
>   * The FSM takes actions on the arcs between FSM states. The driver
> implements
>   * the following behavior for the FSM arcs:
>   *
> - * RUNNING -> STOP
> + * RUNNING_P2P -> STOP
>   * STOP_COPY -> STOP
>   *   While in STOP the device must stop the operation of the device. The
> device
>   *   must not generate interrupts, DMA, or any other change to external state.
> @@ -1094,11 +1103,16 @@ struct vfio_device_feature_mig_state {
>   *
>   *   To abort a RESUMING session the device must be reset.
>   *
> - * STOP -> RUNNING
> + * RUNNING_P2P -> RUNNING
>   *   While in RUNNING the device is fully operational, the device may
> generate
>   *   interrupts, DMA, respond to MMIO, all vfio device regions are functional,
>   *   and the device may advance its internal state.
>   *
> + * RUNNING -> RUNNING_P2P
> + * STOP -> RUNNING_P2P
> + *   While in RUNNING_P2P the device is partially running in the P2P
> quiescent
> + *   state defined below.
> + *
>   * STOP -> STOP_COPY
>   *   This arc begin the process of saving the device state and will return a
>   *   new data_fd.
> @@ -1128,6 +1142,18 @@ struct vfio_device_feature_mig_state {
>   *   To recover from ERROR VFIO_DEVICE_RESET must be used to return the
>   *   device_state back to RUNNING.
>   *
> + * The optional peer to peer (P2P) quiescent state is intended to be a
> quiescent
> + * state for the device for the purposes of managing multiple devices within
> a
> + * user context where peer-to-peer DMA between devices may be active.
> The
> + * RUNNING_P2P states must prevent the device from initiating
> + * any new P2P DMA transactions. If the device can identify P2P transactions
> + * then it can stop only P2P DMA, otherwise it must stop all DMA. The
> migration
> + * driver must complete any such outstanding operations prior to
> completing the
> + * FSM arc into a P2P state. For the purpose of specification the states
> + * behave as though the device was fully running if not supported. Like
> while in
> + * STOP or STOP_COPY the user must not touch the device, otherwise the
> state
> + * can be exited.
> + *
>   * The remaining possible transitions are interpreted as combinations of the
>   * above FSM arcs. As there are multiple paths through the FSM arcs the
> path
>   * should be selected based on the following rules:
> @@ -1140,6 +1166,11 @@ struct vfio_device_feature_mig_state {
>   * fails. When handling these types of errors users should anticipate future
>   * revisions of this protocol using new states and those states becoming
>   * visible in this case.
> + *
> + * The optional states cannot be used with SET_STATE if the device does not
> + * support them. The user can discover if these states are supported by
> using
> + * VFIO_DEVICE_FEATURE_MIGRATION. By using combination transitions
> the user can
> + * avoid knowing about these optional states if the kernel driver supports
> them.
>   */
>  enum vfio_device_mig_state {
>  	VFIO_DEVICE_STATE_ERROR = 0,
> @@ -1147,6 +1178,7 @@ enum vfio_device_mig_state {
>  	VFIO_DEVICE_STATE_RUNNING = 2,
>  	VFIO_DEVICE_STATE_STOP_COPY = 3,
>  	VFIO_DEVICE_STATE_RESUMING = 4,
> +	VFIO_DEVICE_STATE_RUNNING_P2P = 5,
>  };
> 
>  /* -------- API for Type1 VFIO IOMMU -------- */
> --
> 2.18.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ