[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220224142024.147653-1-yishaih@nvidia.com>
Date: Thu, 24 Feb 2022 16:20:09 +0200
From: Yishai Hadas <yishaih@...dia.com>
To: <alex.williamson@...hat.com>, <bhelgaas@...gle.com>,
<jgg@...dia.com>, <saeedm@...dia.com>
CC: <linux-pci@...r.kernel.org>, <kvm@...r.kernel.org>,
<netdev@...r.kernel.org>, <kuba@...nel.org>, <leonro@...dia.com>,
<kwankhede@...dia.com>, <mgurtovoy@...dia.com>,
<yishaih@...dia.com>, <maorg@...dia.com>, <cohuck@...hat.com>,
<ashok.raj@...el.com>, <kevin.tian@...el.com>,
<shameerali.kolothum.thodi@...wei.com>
Subject: [PATCH V9 mlx5-next 00/15] Add mlx5 live migration driver and v2 migration protocol
This series adds mlx5 live migration driver for VFs that are migration
capable and includes the v2 migration protocol definition and mlx5
implementation.
The mlx5 driver uses the vfio_pci_core split to create a specific VFIO
PCI driver that matches the mlx5 virtual functions. The driver provides
the same experience as normal vfio-pci with the addition of migration
support.
In HW the migration is controlled by the PF function, using its
mlx5_core driver, and the VFIO PCI VF driver co-ordinates with the PF to
execute the migration actions.
The bulk of the v2 migration protocol is semantically the same v1,
however it has been recast into a FSM for the device_state and the
actual syscall interface uses normal ioctl(), read() and write() instead
of building a syscall interface using the region.
Several bits of infrastructure work are included here:
- pci_iov_vf_id() to help drivers like mlx5 figure out the VF index from
a BDF
- pci_iov_get_pf_drvdata() to clarify the tricky locking protocol when a
VF reaches into its PF's driver
- mlx5_core uses the normal SRIOV lifecycle and disables SRIOV before
driver remove, to be compatible with pci_iov_get_pf_drvdata()
- Lifting VFIO_DEVICE_FEATURE into core VFIO code
This series comes after alot of discussion. Some major points:
- v1 ABI compatible migration defined using the same FSM approach:
https://lore.kernel.org/all/0-v1-a4f7cab64938+3f-vfio_mig_states_jgg@nvidia.com/
- Attempts to clarify how the v1 API works:
Alex's:
https://lore.kernel.org/kvm/163909282574.728533.7460416142511440919.stgit@omen/
Jason's:
https://lore.kernel.org/all/0-v3-184b374ad0a8+24c-vfio_mig_doc_jgg@nvidia.com/
- Etherpad exploring the scope and questions of general VFIO migration:
https://lore.kernel.org/kvm/87mtm2loml.fsf@redhat.com/
NOTE: As this series touched mlx5_core parts we need to send this in a
pull request format to VFIO to avoid conflicts.
Matching qemu changes can be previewed here:
https://github.com/jgunthorpe/qemu/commits/vfio_migration_v2
Changes from V8: https://lore.kernel.org/kvm/20220220095716.153757-1-yishaih@nvidia.com/
vfio:
- Fix some documentation notes given by Alex and Cornelia for v2.
- Add Reviewed-by: Kevin Tian <kevin.tian@...el.com>
vfio/mlx5, net/mlx5:
- Use more inclusive terminology for slave/master as was asked by Alex.
Changes from V7: https://lore.kernel.org/kvm/20220207172216.206415-1-yishaih@nvidia.com/T/
vfio:
- Fix and improve some documentation notes.
- Improve vfio_ioctl_device_feature_migration() to check for the
existence of both set and get device ops.
- Improve some commit logs.
- Drop the PRE_COPY patch as was asked by Alex since we have no proposed
in-kernel users.
- Add Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@...wei.com>.
vfio/mlx5:
- Better packing struct mlx5vf_pci_core_device.
net/mlx5:
- Update mlx5 command list for error/debug cases.
Changes from V6: https://lore.kernel.org/netdev/20220130160826.32449-1-yishaih@nvidia.com/
vfio:
- Move to use the FEATURE ioctl for setting/getting the device state.
- Use state_flags_table as part of vfio_mig_get_next_state() and use
WARN_ON as Alex suggested.
- Leave the V1 definitions in the uAPI header and drop only its
documentation till V2 will be part of Linus's tree.
- Fix errno's usage in few places.
- Improve and adapt the uAPI documentation to match the latest code.
- Put the VFIO_DEVICE_FEATURE_PCI_VF_TOKEN functionality into a separate
function.
- Fix some rebase note.
vfio/mlx5:
- Adapt to use the vfio core changes.
- Fix some bad flow upon load state.
Changes from V5: https://lore.kernel.org/kvm/20211027095658.144468-1-yishaih@nvidia.com/
vfio:
- Migration protocol v2:
+ enum for device state, not bitmap
+ ioctl to manipulate device_state, not a region
+ Only STOP_COPY is mandatory, P2P and PRE_COPY are optional, discovered
via VFIO_DEVICE_FEATURE
+ Migration data transfer is done via dedicated FD
- VFIO core code to implement the migration related ioctls and help
drivers implement it correctly
- VFIO_DEVICE_FEATURE refactor
- Delete migration protocol, drop patches fixing it
- Drop "vfio/pci_core: Make the region->release() function optional"
vfio/mlx5:
- Switch to use migration v2 protocol, with core helpers
- Eliminate the region implementation
Changes from V4: https://lore.kernel.org/kvm/20211026090605.91646-1-yishaih@nvidia.com/
vfio:
- Add some Reviewed-by.
- Rename to vfio_pci_core_aer_err_detected() as Alex asked.
vfio/mlx5:
- Improve to enter the error state only if unquiesce also fails.
- Fix some typos.
- Use the multi-line comment style as in drivers/vfio.
Changes from V3: https://lore.kernel.org/kvm/20211024083019.232813-1-yishaih@nvidia.com/
vfio/mlx5:
- Align with mlx5 latest specification to create the MKEY with full read
write permissions.
- Fix unlock ordering in mlx5vf_state_mutex_unlock() to prevent some
race.
Changes from V2: https://lore.kernel.org/kvm/20211019105838.227569-1-yishaih@nvidia.com/
vfio:
- Put and use the new macro VFIO_DEVICE_STATE_SET_ERROR as Alex asked.
vfio/mlx5:
- Improve/fix state checking as was asked by Alex & Jason.
- Let things be done in a deterministic way upon 'reset_done' following
the suggested algorithm by Jason.
- Align with mlx5 latest specification when calling the SAVE command.
- Fix some typos.
vdpa/mlx5:
- Drop the patch from the series based on the discussion in the mailing
list.
Changes from V1: https://lore.kernel.org/kvm/20211013094707.163054-1-yishaih@nvidia.com/
PCI/IOV:
- Add actual interface in the subject as was asked by Bjorn and add
his Acked-by.
- Move to check explicitly for !dev->is_virtfn as was asked by Alex.
vfio:
- Come with a separate patch for fixing the non-compiled
VFIO_DEVICE_STATE_SET_ERROR macro.
- Expose vfio_pci_aer_err_detected() to be set by drivers on their own
pci error handles.
- Add a macro for VFIO_DEVICE_STATE_ERROR in the uapi header file as was
suggested by Alex.
vfio/mlx5:
- Improve to use xor as part of checking the 'state' change command as
was suggested by Alex.
- Set state to VFIO_DEVICE_STATE_ERROR when an error occurred instead of
VFIO_DEVICE_STATE_INVALID.
- Improve state checking as was suggested by Jason.
- Use its own PCI reset_done error handler as was suggested by Jason and
fix the locking scheme around the state mutex to work properly.
Changes from V0: https://lore.kernel.org/kvm/cover.1632305919.git.leonro@nvidia.com/
PCI/IOV:
- Add an API (i.e. pci_iov_get_pf_drvdata()) that allows SRVIO VF drivers
to reach the drvdata of a PF.
mlx5_core:
- Add an extra patch to disable SRIOV before PF removal.
- Adapt to use the above PCI/IOV API as part of mlx5_vf_get_core_dev().
- Reuse the exported PCI/IOV virtfn index function call (i.e. pci_iov_vf_id().
vfio:
- Add support in the pci_core to let a driver be notified when
'reset_done' to let it sets its internal state accordingly.
- Add some helper stuff for 'invalid' state handling.
mlx5_vfio_pci:
- Move to use the 'command mode' instead of the 'state machine'
scheme as was discussed in the mailing list.
- Handle the RESET scenario when called by vfio_pci_core to sets
its internal state accordingly.
- Set initial state as RUNNING.
- Put the driver files as sub-folder under drivers/vfio/pci named mlx5
and update MAINTAINER file as was asked.
vdpa_mlx5:
Add a new patch to use mlx5_vf_get_core_dev() to get PF device.
Jason Gunthorpe (6):
PCI/IOV: Add pci_iov_vf_id() to get VF index
PCI/IOV: Add pci_iov_get_pf_drvdata() to allow VF reaching the drvdata
of a PF
vfio: Have the core code decode the VFIO_DEVICE_FEATURE ioctl
vfio: Define device migration protocol v2
vfio: Extend the device migration protocol with RUNNING_P2P
vfio: Remove migration protocol v1 documentation
Leon Romanovsky (1):
net/mlx5: Reuse exported virtfn index function call
Yishai Hadas (8):
net/mlx5: Disable SRIOV before PF removal
net/mlx5: Expose APIs to get/put the mlx5 core device
net/mlx5: Introduce migration bits and structures
net/mlx5: Add migration commands definitions
vfio/mlx5: Expose migration commands over mlx5 device
vfio/mlx5: Implement vfio_pci driver for mlx5 devices
vfio/pci: Expose vfio_pci_core_aer_err_detected()
vfio/mlx5: Use its own PCI reset_done error handler
MAINTAINERS | 6 +
drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 10 +
.../net/ethernet/mellanox/mlx5/core/main.c | 45 ++
.../ethernet/mellanox/mlx5/core/mlx5_core.h | 1 +
.../net/ethernet/mellanox/mlx5/core/sriov.c | 17 +-
drivers/pci/iov.c | 43 ++
drivers/vfio/pci/Kconfig | 3 +
drivers/vfio/pci/Makefile | 2 +
drivers/vfio/pci/mlx5/Kconfig | 10 +
drivers/vfio/pci/mlx5/Makefile | 4 +
drivers/vfio/pci/mlx5/cmd.c | 259 +++++++
drivers/vfio/pci/mlx5/cmd.h | 36 +
drivers/vfio/pci/mlx5/main.c | 676 ++++++++++++++++++
drivers/vfio/pci/vfio_pci.c | 1 +
drivers/vfio/pci/vfio_pci_core.c | 101 ++-
drivers/vfio/vfio.c | 295 +++++++-
include/linux/mlx5/driver.h | 3 +
include/linux/mlx5/mlx5_ifc.h | 147 +++-
include/linux/pci.h | 15 +-
include/linux/vfio.h | 53 ++
include/linux/vfio_pci_core.h | 4 +
include/uapi/linux/vfio.h | 406 +++++------
22 files changed, 1846 insertions(+), 291 deletions(-)
create mode 100644 drivers/vfio/pci/mlx5/Kconfig
create mode 100644 drivers/vfio/pci/mlx5/Makefile
create mode 100644 drivers/vfio/pci/mlx5/cmd.c
create mode 100644 drivers/vfio/pci/mlx5/cmd.h
create mode 100644 drivers/vfio/pci/mlx5/main.c
--
2.18.1
Powered by blists - more mailing lists