[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <635f11c40e814d749ccf533f1414ba4e@huawei.com>
Date: Wed, 2 Mar 2022 09:07:38 +0000
From: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@...wei.com>
To: Jason Gunthorpe <jgg@...dia.com>,
Alex Williamson <alex.williamson@...hat.com>
CC: "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-crypto@...r.kernel.org" <linux-crypto@...r.kernel.org>,
"cohuck@...hat.com" <cohuck@...hat.com>,
"mgurtovoy@...dia.com" <mgurtovoy@...dia.com>,
"yishaih@...dia.com" <yishaih@...dia.com>,
Linuxarm <linuxarm@...wei.com>,
liulongfang <liulongfang@...wei.com>,
"Zengtao (B)" <prime.zeng@...ilicon.com>,
Jonathan Cameron <jonathan.cameron@...wei.com>,
"Wangzhou (B)" <wangzhou1@...ilicon.com>
Subject: RE: [PATCH v6 09/10] hisi_acc_vfio_pci: Add support for VFIO live
migration
> -----Original Message-----
> From: Jason Gunthorpe [mailto:jgg@...dia.com]
> Sent: 02 March 2022 00:03
> To: Alex Williamson <alex.williamson@...hat.com>
> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@...wei.com>;
> kvm@...r.kernel.org; linux-kernel@...r.kernel.org;
> linux-crypto@...r.kernel.org; cohuck@...hat.com; mgurtovoy@...dia.com;
> yishaih@...dia.com; Linuxarm <linuxarm@...wei.com>; liulongfang
> <liulongfang@...wei.com>; Zengtao (B) <prime.zeng@...ilicon.com>;
> Jonathan Cameron <jonathan.cameron@...wei.com>; Wangzhou (B)
> <wangzhou1@...ilicon.com>
> Subject: Re: [PATCH v6 09/10] hisi_acc_vfio_pci: Add support for VFIO live
> migration
>
> On Tue, Mar 01, 2022 at 03:44:31PM -0700, Alex Williamson wrote:
> > On Tue, 1 Mar 2022 16:39:38 -0400
> > Jason Gunthorpe <jgg@...dia.com> wrote:
> >
> > > On Tue, Mar 01, 2022 at 12:30:47PM -0700, Alex Williamson wrote:
> > > > Wouldn't it make more sense if initial-bytes started at QM_MATCH_SIZE
> > > > and dirty-bytes was always sizeof(vf_data) - QM_MATCH_SIZE? ie.
> QEMU
> > > > would know that it has sizeof(vf_data) - QM_MATCH_SIZE remaining even
> > > > while it's getting ENOMSG after reading QM_MATCH_SIZE bytes of data.
> > >
> > > The purpose of this ioctl is to help userspace guess when moving on to
> > > STOP_COPY is a good idea ie when the device has done almost all the
> > > work it is going to be able to do in PRE_COPY. ENOMSG is a similar
> > > indicator.
> > >
> > > I expect all devices to have some additional STOP_COPY trailer_data in
> > > addition to their PRE_COPY initial_data and dirty_data
> > >
> > > There is a choice to make if we report the trailer_data during
> > > PRE_COPY or not. As this is all estimates, it doesn't matter unless
> > > the trailer_data is very big.
> > >
> > > Having all devices trend toward a 0 dirty_bytes to say they are are
> > > done all the pre-copy they can do makes sense from an API
> > > perspective. If one device trends toward 10MB due to a big
> > > trailer_data and one trends toward 0 bytes, how will qemu consistently
> > > decide when best to trigger STOP_COPY? It makes the API less useful.
> > >
> > > So, I would not include trailer_data in the dirty_bytes.
> >
> > That assumes that it's possible to keep up with the device dirty
> > rate.
>
> It keeps options open so we have this choice someday.
>
> We already see that implementations are using vCPU throttling as part
> of their migration strategy, and we are seriously looking at DMA
> throttling. It is not a big leap to imagine that
> internal-state-dirtying throttling will happne someday.
>
> With throttling iterations would ratchet up the throttle until they
> reach an absolute small amount of dirty then cut over to STOP_COPY
>
> > It seems like a better approach for userspace would be to look at how
> > dirty_bytes is trending.
>
> It may be biw, but this approach doesn't care if the trailing_bytes
> are included or not, so lets leave them out and preserve the other
> operating model.
>
> > If we exclude STOP_COPY trailing data from the VFIO_DEVICE_MIG_PRECOPY
> > ioctl, it seems even more of a disconnect that when we enter the
> > STOP_COPY state, suddenly we start getting new data out of a PRECOPY
> > ioctl.
>
> Why? That amounts can go up at any time, how does it matter if it goes
> up after STOP_COPY or instantly before?
>
> > BTW, "VFIO_DEVICE" should be reserved for ioctls and data structures
> > relative to the device FD, appending it with _MIG is too subtle for me.
> > This is also a GET operation for INFO, so I'd think for consistency
> > with the existing vfio uAPI we'd name this something like
> > VFIO_MIG_GET_PRECOPY_INFO where the structure might be named
> > vfio_precopy_info.
>
> Sure
>
> > So if we don't think this is the right approach for STOP_COPY, then why
> > are we pushing that it has any purpose outside of PRECOPY or might be
> > implemented by a non-PRECOPY driver for use in STOP_COPY?
>
> It is just simpler and more consistent to implement the math under
> this ioctl in all cases then to try and artificially restrict it.
>
> But I don't have a use case for it, so lets block it if you prefer.
>
> Shameerali will you make these adjustments to the PRE_COPY patch?
Sure. I think we can summarize the discussion as below,
- Rename the MIG_PRECOPY ioctl to VFIO_MIG_GET_PRECOPY_INFO and
structure to vfio_precopy_info.
- This ioctl is only valid in PRE_COPY state and should return -EINVAL in
other states(Update the documentation).
- No changes to the initial_bytes & dirty_bytes descriptions.
Please let me know if I missed anything.
I will address other comments on this series as well and sent out a
revised one soon.
Thanks,
Shameer
Powered by blists - more mailing lists