[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <PH0PR12MB5481891053E37A79920991F6DCF9A@PH0PR12MB5481.namprd12.prod.outlook.com>
Date: Wed, 20 Sep 2023 07:10:46 +0000
From: Parav Pandit <parav@...dia.com>
To: "Zhu, Lingshan" <lingshan.zhu@...el.com>,
"Chen, Jiqian" <Jiqian.Chen@....com>,
"Michael S. Tsirkin" <mst@...hat.com>
CC: Gerd Hoffmann <kraxel@...hat.com>,
Jason Wang <jasowang@...hat.com>,
Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
David Airlie <airlied@...hat.com>,
Gurchetan Singh <gurchetansingh@...omium.org>,
Chia-I Wu <olvaffe@...il.com>,
Marc-André Lureau <marcandre.lureau@...il.com>,
Robert Beckett <bob.beckett@...labora.com>,
Mikhail Golubev-Ciuchea <Mikhail.Golubev-Ciuchea@...nsynergy.com>,
"virtio-comment@...ts.oasis-open.org"
<virtio-comment@...ts.oasis-open.org>,
"virtio-dev@...ts.oasis-open.org" <virtio-dev@...ts.oasis-open.org>,
"qemu-devel@...gnu.org" <qemu-devel@...gnu.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Stefano Stabellini <sstabellini@...nel.org>,
Roger Pau Monné <roger.pau@...rix.com>,
"Deucher, Alexander" <Alexander.Deucher@....com>,
"Koenig, Christian" <Christian.Koenig@....com>,
"Hildebrand, Stewart" <Stewart.Hildebrand@....com>,
Xenia Ragiadakou <burzalodowa@...il.com>,
"Huang, Honglei1" <Honglei1.Huang@....com>,
"Zhang, Julia" <Julia.Zhang@....com>,
"Huang, Ray" <Ray.Huang@....com>
Subject: RE: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1]
transport-pci: Add freeze_mode to virtio_pci_common_cfg
> From: Zhu, Lingshan <lingshan.zhu@...el.com>
> Sent: Wednesday, September 20, 2023 12:37 PM
> > The problem to overcome in [1] is, resume operation needs to be synchronous
> as it involves large part of context to resume back, and hence just
> asynchronously setting DRIVER_OK is not enough.
> > The sw must verify back that device has resumed the operation and ready to
> answer requests.
> this is not live migration, all device status and other information still stay in the
> device, no need to "resume" context, just resume running.
>
I am aware that it is not live migration. :)
"Just resuming" involves lot of device setup task. The device implementation does not know for how long a device is suspended.
So for example, a VM is suspended for 6 hours, hence the device context could be saved in a slow disk.
Hence, when the resume is done, it needs to setup things again and driver got to verify before accessing more from the device.
> Like resume from a failed LM.
> >
> > This is slightly different flow than setting the DRIVER_OK for the first time
> device initialization sequence as it does not involve large restoration.
> >
> > So, to merge two ideas, instead of doing DRIVER_OK to resume, the driver
> should clear the SUSPEND bit and verify that it is out of SUSPEND.
> >
> > Because driver is still in _OK_ driving the device flipping the SUSPEND bit.
> Please read the spec, it says:
> The driver MUST NOT clear a device status bit
>
Yes, this is why either DRIER_OK validation by the driver is needed or Jiqian's synchronous new register..
Powered by blists - more mailing lists