[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240816090159.1967650-1-dtatulea@nvidia.com>
Date: Fri, 16 Aug 2024 12:01:49 +0300
From: Dragos Tatulea <dtatulea@...dia.com>
To: "Michael S . Tsirkin" <mst@...hat.com>, Jason Wang <jasowang@...hat.com>,
Eugenio Perez Martin <eperezma@...hat.com>,
<virtualization@...ts.linux-foundation.org>
CC: Dragos Tatulea <dtatulea@...dia.com>, Si-Wei Liu <si-wei.liu@...cle.com>,
Saeed Mahameed <saeedm@...dia.com>, Leon Romanovsky <leon@...nel.org>,
<kvm@...r.kernel.org>, <linux-kernel@...r.kernel.org>, Gal Pressman
<gal@...dia.com>, Parav Pandit <parav@...dia.com>, Xuan Zhuo
<xuanzhuo@...ux.alibaba.com>
Subject: [PATCH vhost v2 00/10] vdpa/mlx5: Parallelize device suspend/resume
This series parallelizes the mlx5_vdpa device suspend and resume
operations through the firmware async API. The purpose is to reduce live
migration downtime.
The series starts with changing the VQ suspend and resume commands
to the async API. After that, the switch is made to issue multiple
commands of the same type in parallel.
Then, the an additional improvement is added: keep the notifiers enabled
during suspend but make it a NOP. Upon resume make sure that the link
state is forwarded. This shaves around 30ms per device constant time.
Finally, use parallel VQ suspend and resume during the CVQ MQ command.
For 1 vDPA device x 32 VQs (16 VQPs), on a large VM (256 GB RAM, 32 CPUs
x 2 threads per core), the improvements are:
+-------------------+--------+--------+-----------+
| operation | Before | After | Reduction |
|-------------------+--------+--------+-----------|
| mlx5_vdpa_suspend | 37 ms | 2.5 ms | 14x |
| mlx5_vdpa_resume | 16 ms | 5 ms | 3x |
+-------------------+--------+--------+-----------+
---
v2:
- Changed to parallel VQ suspend/resume during CVQ MQ command.
Support added in the last 2 patches.
- Made the fw async command more generic and moved it to resources.c.
Did that because the following series (parallel mkey ops) needs this
code as well.
Dropped Acked-by from Eugenio on modified patches.
- Fixed kfree -> kvfree.
- Removed extra newline caught during review.
- As discussed in the v1, the series can be pulled in completely in
the vhost tree [0]. The mlx5_core patch was reviewed by Tariq who is
also a maintainer for mlx5_core.
[0] - https://lore.kernel.org/virtualization/6582792d-8db2-4bc0-bf3a-248fe5c8fc56@nvidia.com/T/#maefabb2fde5adfb322d16ca16ae64d540f75b7d2
Dragos Tatulea (10):
net/mlx5: Support throttled commands from async API
vdpa/mlx5: Introduce error logging function
vdpa/mlx5: Introduce async fw command wrapper
vdpa/mlx5: Use async API for vq query command
vdpa/mlx5: Use async API for vq modify commands
vdpa/mlx5: Parallelize device suspend
vdpa/mlx5: Parallelize device resume
vdpa/mlx5: Keep notifiers during suspend but ignore
vdpa/mlx5: Small improvement for change_num_qps()
vdpa/mlx5: Parallelize VQ suspend/resume for CVQ MQ command
drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 21 +-
drivers/vdpa/mlx5/core/mlx5_vdpa.h | 22 +
drivers/vdpa/mlx5/core/resources.c | 73 ++++
drivers/vdpa/mlx5/net/mlx5_vnet.c | 396 +++++++++++-------
4 files changed, 361 insertions(+), 151 deletions(-)
--
2.45.1
Powered by blists - more mailing lists