[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2bdf263b-b0f8-41b6-8b40-6ace397e9bf8@oracle.com>
Date: Fri, 21 Feb 2025 10:03:35 -0800
From: dongli.zhang@...cle.com
To: virtualization@...ts.linux.dev, netdev@...r.kernel.org,
kvm@...r.kernel.org
Cc: mst@...hat.com, jasowang@...hat.com, eperezma@...hat.com,
michael.christie@...cle.com, pbonzini@...hat.com, stefanha@...hat.com,
joao.m.martins@...cle.com, joe.jin@...cle.com, si-wei.liu@...cle.com,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/9] vhost-scsi: log write descriptors for live migration
(and two bugfix)
Thanks to the suggestion from Mike, I am going re-send v2 with:
1. Re-base on top of the below patchset.
[PATCH v2 0/8] vhost-scsi: Memory reduction patches
https://yhbt.net/lore/target-devel/20241203191705.19431-1-michael.christie@oracle.com/
The patchset can clean apply/build on top of the commit 87a132e73910
("Merge tag 'mm-hotfixes-stable-2025-02-19-17-49' of
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm").
2. Don't allocate all per-cmd log buffer until VHOST_F_LOG_ALL is set.
Either to take advantage of vhost_scsi_set_features(), or follow the idea
of below patch.
[PATCH v2 5/8] vhost-scsi: Dynamically allocate scatterlists
https://yhbt.net/lore/target-devel/20241203191705.19431-6-michael.christie@oracle.com/
Thank you very much!
Dongli Zhang
On 2/7/25 10:41 AM, Dongli Zhang wrote:
> The live migration with vhost-scsi has been enabled by QEMU commit
> b3e89c941a85 ("vhost-scsi: Allow user to enable migration"), which
> thoroughly explains the workflow that QEMU collaborates with vhost-scsi on
> the live migration.
>
> Although it logs dirty data for the used ring, it doesn't log any write
> descriptor (VRING_DESC_F_WRITE).
>
> In comparison, vhost-net logs write descriptors via vhost_log_write(). The
> SPDK (vhost-user-scsi backend) also logs write descriptors via
> vhost_log_req_desc().
>
> As a result, there is likely data mismatch between memory and vhost-scsi
> disk during the live migration.
>
> 1. Suppose there is high workload and high memory usage. Suppose some
> systemd userspace pages are swapped out to the swap disk.
>
> 2. Upon request from systemd, the kernel reads some pages from the swap
> disk to the memory via vhost-scsi.
>
> 3. Although those userspace pages' data are updated, they are not marked as
> dirty by vhost-scsi (this is the bug). They are not going to migrate to the
> target host during memory transfer iterations.
>
> 4. Suppose systemd doesn't write to those pages any longer. Those pages
> never get the chance to be dirty or migrated any longer.
>
> 5. Once the guest VM is resumed on the target host, because of the lack of
> those dirty pages' data, the systemd may run into abnormal status, i.e.,
> there may be systemd segfault.
>
> Log all write descriptors to fix the issue.
>
> In addition, the patchset also fixes two bugs in vhost-scsi.
>
> Dongli Zhang (log descriptor, suggested by Joao Martins):
> vhost: modify vhost_log_write() for broader users
> vhost-scsi: adjust vhost_scsi_get_desc() to log vring descriptors
> vhost-scsi: cache log buffer in I/O queue vhost_scsi_cmd
> vhost-scsi: log I/O queue write descriptors
> vhost-scsi: log control queue write descriptors
> vhost-scsi: log event queue write descriptors
> vhost: add WARNING if log_num is more than limit
>
> Dongli Zhang (vhost-scsi bugfix):
> vhost-scsi: protect vq->log_used with vq->mutex
> vhost-scsi: Fix vhost_scsi_send_bad_target()
>
> drivers/vhost/net.c | 2 +-
> drivers/vhost/scsi.c | 191 +++++++++++++++++++++++++++++++++++++++------
> drivers/vhost/vhost.c | 46 ++++++++---
> drivers/vhost/vhost.h | 2 +-
> 4 files changed, 206 insertions(+), 35 deletions(-)
>
>
> base-commit: 5c8c229261f14159b54b9a32f12e5fa89d88b905
>
> Thank you very much!
>
> Dongli Zhang
>
>
Powered by blists - more mailing lists