[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <460f52a8-7a4a-96f7-b0db-299d3cfc244c@huawei.com>
Date: Sat, 10 May 2025 15:41:12 +0800
From: liulongfang <liulongfang@...wei.com>
To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@...wei.com>,
"alex.williamson@...hat.com" <alex.williamson@...hat.com>, "jgg@...dia.com"
<jgg@...dia.com>, Jonathan Cameron <jonathan.cameron@...wei.com>
CC: "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linuxarm@...neuler.org" <linuxarm@...neuler.org>
Subject: Re: [PATCH v7 5/6] hisi_acc_vfio_pci: bugfix live migration function
without VF device driver
On 2025/4/15 16:45, Shameerali Kolothum Thodi wrote:
>
>
>> -----Original Message-----
>> From: liulongfang <liulongfang@...wei.com>
>> Sent: Friday, April 11, 2025 4:59 AM
>> To: alex.williamson@...hat.com; jgg@...dia.com; Shameerali Kolothum
>> Thodi <shameerali.kolothum.thodi@...wei.com>; Jonathan Cameron
>> <jonathan.cameron@...wei.com>
>> Cc: kvm@...r.kernel.org; linux-kernel@...r.kernel.org;
>> linuxarm@...neuler.org; liulongfang <liulongfang@...wei.com>
>> Subject: [PATCH v7 5/6] hisi_acc_vfio_pci: bugfix live migration function
>> without VF device driver
>>
>> If the VF device driver is not loaded in the Guest OS and we attempt to
>> perform device data migration, the address of the migrated data will
>> be NULL.
>> The live migration recovery operation on the destination side will
>> access a null address value, which will cause access errors.
>>
>> Therefore, live migration of VMs without added VF device drivers
>> does not require device data migration.
>> In addition, when the queue address data obtained by the destination
>> is empty, device queue recovery processing will not be performed.
>>
>> Fixes: b0eed085903e ("hisi_acc_vfio_pci: Add support for VFIO live
>> migration")
>> Signed-off-by: Longfang Liu <liulongfang@...wei.com>
>> Reviewed-by: Shameer Kolothum
>> <shameerali.kolothum.thodi@...wei.com>
>> ---
>> .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 22 +++++++++++++------
>> 1 file changed, 15 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
>> b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
>> index cadc82419dca..d12a350440d3 100644
>> --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
>> +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
>> @@ -426,13 +426,6 @@ static int vf_qm_check_match(struct
>> hisi_acc_vf_core_device *hisi_acc_vdev,
>> return -EINVAL;
>> }
>>
>> - ret = qm_write_regs(vf_qm, QM_VF_STATE, &vf_data->vf_qm_state,
>> 1);
>> - if (ret) {
>> - dev_err(dev, "failed to write QM_VF_STATE\n");
>> - return ret;
>> - }
>> -
>> - hisi_acc_vdev->vf_qm_state = vf_data->vf_qm_state;
>> hisi_acc_vdev->match_done = true;
>> return 0;
>> }
>> @@ -498,6 +491,20 @@ static int vf_qm_load_data(struct
>> hisi_acc_vf_core_device *hisi_acc_vdev,
>> if (migf->total_length < sizeof(struct acc_vf_data))
>> return -EINVAL;
>>
>> + if (!vf_data->eqe_dma || !vf_data->aeqe_dma ||
>> + !vf_data->sqc_dma || !vf_data->cqc_dma) {
>> + dev_info(dev, "resume dma addr is NULL!\n");
>> + hisi_acc_vdev->vf_qm_state = QM_NOT_READY;
>> + return 0;
>> + }
>
>
> I am still not fully understood why we need the above check. The only case as
> far as I can think of where this will happen is when the source VM Guest has
> not loaded the ACC driver. And we do take care of that already using vf_qm_state and
> checking total_length == QM_MATCH_SIZE.
>
> Have you seen this happening in any other scenario during your tests?
>
This is a problem that was discovered and fixed in previous abnormal scenario tests.
In the abnormal tests, abnormal error handling was performed and real-time migration
was executed. However, due to the lack of this check, the device became abnormal
after migration.
Thanks,
Longfang.
> Thanks,
> Shameer
>
>> +
>> + ret = qm_write_regs(qm, QM_VF_STATE, &vf_data->vf_qm_state, 1);
>> + if (ret) {
>> + dev_err(dev, "failed to write QM_VF_STATE\n");
>> + return -EINVAL;
>> + }
>> + hisi_acc_vdev->vf_qm_state = vf_data->vf_qm_state;
>> +
>> qm->eqe_dma = vf_data->eqe_dma;
>> qm->aeqe_dma = vf_data->aeqe_dma;
>> qm->sqc_dma = vf_data->sqc_dma;
>> @@ -1531,6 +1538,7 @@ static int hisi_acc_vfio_pci_migrn_init_dev(struct
>> vfio_device *core_vdev)
>> hisi_acc_vdev->vf_id = pci_iov_vf_id(pdev) + 1;
>> hisi_acc_vdev->pf_qm = pf_qm;
>> hisi_acc_vdev->vf_dev = pdev;
>> + hisi_acc_vdev->vf_qm_state = QM_NOT_READY;
>> mutex_init(&hisi_acc_vdev->state_mutex);
>> mutex_init(&hisi_acc_vdev->open_mutex);
>>
>> --
>> 2.24.0
>
> .
>
Powered by blists - more mailing lists