[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <024fd8e2334141b688150650728699ba@huawei.com>
Date: Wed, 26 Feb 2025 08:11:17 +0000
From: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@...wei.com>
To: Alex Williamson <alex.williamson@...hat.com>, liulongfang
<liulongfang@...wei.com>
CC: "jgg@...dia.com" <jgg@...dia.com>, Jonathan Cameron
<jonathan.cameron@...wei.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linuxarm@...neuler.org" <linuxarm@...neuler.org>
Subject: RE: [PATCH v4 1/5] hisi_acc_vfio_pci: fix XQE dma address error
> -----Original Message-----
> From: Alex Williamson <alex.williamson@...hat.com>
> Sent: Wednesday, February 26, 2025 12:10 AM
> To: liulongfang <liulongfang@...wei.com>
> Cc: jgg@...dia.com; Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@...wei.com>; Jonathan Cameron
> <jonathan.cameron@...wei.com>; kvm@...r.kernel.org; linux-
> kernel@...r.kernel.org; linuxarm@...neuler.org
> Subject: Re: [PATCH v4 1/5] hisi_acc_vfio_pci: fix XQE dma address error
>
> On Tue, 25 Feb 2025 14:27:53 +0800
> Longfang Liu <liulongfang@...wei.com> wrote:
>
> > The dma addresses of EQE and AEQE are wrong after migration and
> > results in guest kernel-mode encryption services failure.
> > Comparing the definition of hardware registers, we found that
> > there was an error when the data read from the register was
> > combined into an address. Therefore, the address combination
> > sequence needs to be corrected.
> >
> > Even after fixing the above problem, we still have an issue
> > where the Guest from an old kernel can get migrated to
> > new kernel and may result in wrong data.
> >
> > In order to ensure that the address is correct after migration,
> > if an old magic number is detected, the dma address needs to be
> > updated.
> >
> > Fixes: b0eed085903e ("hisi_acc_vfio_pci: Add support for VFIO live
> migration")
> > Signed-off-by: Longfang Liu <liulongfang@...wei.com>
> > ---
> > .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 40 ++++++++++++++++---
> > .../vfio/pci/hisilicon/hisi_acc_vfio_pci.h | 14 ++++++-
> > 2 files changed, 46 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> > index 451c639299eb..35316984089b 100644
> > --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> > +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
> > @@ -350,6 +350,31 @@ static int vf_qm_func_stop(struct hisi_qm *qm)
> > return hisi_qm_mb(qm, QM_MB_CMD_PAUSE_QM, 0, 0, 0);
> > }
> >
> > +static int vf_qm_version_check(struct acc_vf_data *vf_data, struct device
> *dev)
> > +{
> > + switch (vf_data->acc_magic) {
> > + case ACC_DEV_MAGIC_V2:
> > + if (vf_data->major_ver < ACC_DRV_MAJOR_VER ||
> > + vf_data->minor_ver < ACC_DRV_MINOR_VER)
> > + dev_info(dev, "migration driver version not
> match!\n");
> > + return -EINVAL;
> > + break;
>
> What's your major/minor update strategy?
>
> Note that minor_ver is a u16 and ACC_DRV_MINOR_VER is defined as 0, so
> testing less than 0 against an unsigned is useless.
>
> Arguably testing major and minor independently is pretty useless too.
>
> You're defining what a future "old" driver version will accept, which
> is very nearly anything, so any breaking change *again* requires a new
> magic, so we're accomplishing very little here.
>
> Maybe you want to consider a strategy where you'd increment the major
> number for a breaking change and minor for a compatible feature. In
> that case you'd want to verify the major_ver matches
> ACC_DRV_MAJOR_VER
> exactly and minor_ver would be more of a feature level.
Agree. I think the above check should be just major_ver != ACC_DRV_MAJOR_VER
and we can make use of minor version whenever we need a specific handling for
a feature support.
Also I think it would be good to print the version info above in case of mismatch.
>
> > + case ACC_DEV_MAGIC_V1:
> > + /* Correct dma address */
> > + vf_data->eqe_dma = vf_data-
> >qm_eqc_dw[QM_XQC_ADDR_HIGH];
> > + vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET;
> > + vf_data->eqe_dma |= vf_data-
> >qm_eqc_dw[QM_XQC_ADDR_LOW];
> > + vf_data->aeqe_dma = vf_data-
> >qm_aeqc_dw[QM_XQC_ADDR_HIGH];
> > + vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET;
> > + vf_data->aeqe_dma |= vf_data-
> >qm_aeqc_dw[QM_XQC_ADDR_LOW];
> > + break;
> > + default:
> > + return -EINVAL;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > static int vf_qm_check_match(struct hisi_acc_vf_core_device
> *hisi_acc_vdev,
> > struct hisi_acc_vf_migration_file *migf)
> > {
> > @@ -363,7 +388,8 @@ static int vf_qm_check_match(struct
> hisi_acc_vf_core_device *hisi_acc_vdev,
> > if (migf->total_length < QM_MATCH_SIZE || hisi_acc_vdev-
> >match_done)
> > return 0;
> >
> > - if (vf_data->acc_magic != ACC_DEV_MAGIC) {
> > + ret = vf_qm_version_check(vf_data, dev);
> > + if (ret) {
> > dev_err(dev, "failed to match ACC_DEV_MAGIC\n");
> > return -EINVAL;
> > }
> > @@ -418,7 +444,9 @@ static int vf_qm_get_match_data(struct
> hisi_acc_vf_core_device *hisi_acc_vdev,
> > int vf_id = hisi_acc_vdev->vf_id;
> > int ret;
> >
> > - vf_data->acc_magic = ACC_DEV_MAGIC;
> > + vf_data->acc_magic = ACC_DEV_MAGIC_V2;
> > + vf_data->major_ver = ACC_DRV_MAR;
> > + vf_data->minor_ver = ACC_DRV_MIN;
>
> Where are "MAR" and "MIN" defined? I can't see how this would even
> compile. Thanks,
Yes. Please make sure to do a compile test and run basic sanity tested even if you
think the changes are minor. Chances are there that you will miss out things like
this.
Thanks,
Shameer
Powered by blists - more mailing lists