lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ff61f675-a5a8-d1d6-a1bf-a5879914b4a9@amd.com>
Date: Wed, 9 Aug 2023 08:44:13 -0700
From: Brett Creeley <bcreeley@....com>
To: Alex Williamson <alex.williamson@...hat.com>,
 Brett Creeley <brett.creeley@....com>
Cc: kvm@...r.kernel.org, netdev@...r.kernel.org, jgg@...dia.com,
 yishaih@...dia.com, shameerali.kolothum.thodi@...wei.com,
 kevin.tian@...el.com, horms@...nel.org, shannon.nelson@....com
Subject: Re: [PATCH v14 vfio 5/8] vfio/pds: Add VFIO live migration support

On 8/8/2023 3:27 PM, Alex Williamson wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> On Mon, 7 Aug 2023 13:57:52 -0700
> Brett Creeley <brett.creeley@....com> wrote:
> ...
>> +static int
>> +pds_vfio_suspend_wait_device_cmd(struct pds_vfio_pci_device *pds_vfio)
>> +{
>> +     union pds_core_adminq_cmd cmd = {
>> +             .lm_suspend_status = {
>> +                     .opcode = PDS_LM_CMD_SUSPEND_STATUS,
>> +                     .vf_id = cpu_to_le16(pds_vfio->vf_id),
>> +             },
>> +     };
>> +     struct device *dev = pds_vfio_to_dev(pds_vfio);
>> +     union pds_core_adminq_comp comp = {};
>> +     unsigned long time_limit;
>> +     unsigned long time_start;
>> +     unsigned long time_done;
>> +     int err;
>> +
>> +     time_start = jiffies;
>> +     time_limit = time_start + HZ * SUSPEND_TIMEOUT_S;
>> +     do {
>> +             err = pds_vfio_client_adminq_cmd(pds_vfio, &cmd, &comp, true);
>> +             if (err != -EAGAIN)
>> +                     break;
>> +
>> +             msleep(SUSPEND_CHECK_INTERVAL_MS);
>> +     } while (time_before(jiffies, time_limit));
>> +
>> +     time_done = jiffies;
>> +     dev_dbg(dev, "%s: vf%u: Suspend comp received in %d msecs\n", __func__,
>> +             pds_vfio->vf_id, jiffies_to_msecs(time_done - time_start));
>> +
>> +     /* Check the results */
>> +     if (time_after_eq(time_done, time_limit)) {
>> +             dev_err(dev, "%s: vf%u: Suspend comp timeout\n", __func__,
>> +                     pds_vfio->vf_id);
>> +             err = -ETIMEDOUT;
> 
> If the command completes successfully but exceeds the time limit
> this turns a success into a failure.  Is that desired?  Thanks,
> 
> Alex
> 

Yes, this is the desired behavior. Based on the testing we have done and 
downtime expectations for live migration it seems like the suspend 
operation timing out after 5 seconds is reasonable behavior. It could 
succeed, but that could also be the case if the timeout value was 10, 
20, 30, or more seconds. Even so, it seems like we don't want to keep 
spinning in the driver forever and we also don't want to further 
increase the VM downtime.

Thanks,

Brett

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ