lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Wed, 2 Mar 2022 09:07:38 +0000
From:   Shameerali Kolothum Thodi <shameerali.kolothum.thodi@...wei.com>
To:     Jason Gunthorpe <jgg@...dia.com>,
        Alex Williamson <alex.williamson@...hat.com>
CC:     "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-crypto@...r.kernel.org" <linux-crypto@...r.kernel.org>,
        "cohuck@...hat.com" <cohuck@...hat.com>,
        "mgurtovoy@...dia.com" <mgurtovoy@...dia.com>,
        "yishaih@...dia.com" <yishaih@...dia.com>,
        Linuxarm <linuxarm@...wei.com>,
        liulongfang <liulongfang@...wei.com>,
        "Zengtao (B)" <prime.zeng@...ilicon.com>,
        Jonathan Cameron <jonathan.cameron@...wei.com>,
        "Wangzhou (B)" <wangzhou1@...ilicon.com>
Subject: RE: [PATCH v6 09/10] hisi_acc_vfio_pci: Add support for VFIO live
 migration



> -----Original Message-----
> From: Jason Gunthorpe [mailto:jgg@...dia.com]
> Sent: 02 March 2022 00:03
> To: Alex Williamson <alex.williamson@...hat.com>
> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@...wei.com>;
> kvm@...r.kernel.org; linux-kernel@...r.kernel.org;
> linux-crypto@...r.kernel.org; cohuck@...hat.com; mgurtovoy@...dia.com;
> yishaih@...dia.com; Linuxarm <linuxarm@...wei.com>; liulongfang
> <liulongfang@...wei.com>; Zengtao (B) <prime.zeng@...ilicon.com>;
> Jonathan Cameron <jonathan.cameron@...wei.com>; Wangzhou (B)
> <wangzhou1@...ilicon.com>
> Subject: Re: [PATCH v6 09/10] hisi_acc_vfio_pci: Add support for VFIO live
> migration
> 
> On Tue, Mar 01, 2022 at 03:44:31PM -0700, Alex Williamson wrote:
> > On Tue, 1 Mar 2022 16:39:38 -0400
> > Jason Gunthorpe <jgg@...dia.com> wrote:
> >
> > > On Tue, Mar 01, 2022 at 12:30:47PM -0700, Alex Williamson wrote:
> > > > Wouldn't it make more sense if initial-bytes started at QM_MATCH_SIZE
> > > > and dirty-bytes was always sizeof(vf_data) - QM_MATCH_SIZE?  ie.
> QEMU
> > > > would know that it has sizeof(vf_data) - QM_MATCH_SIZE remaining even
> > > > while it's getting ENOMSG after reading QM_MATCH_SIZE bytes of data.
> > >
> > > The purpose of this ioctl is to help userspace guess when moving on to
> > > STOP_COPY is a good idea ie when the device has done almost all the
> > > work it is going to be able to do in PRE_COPY. ENOMSG is a similar
> > > indicator.
> > >
> > > I expect all devices to have some additional STOP_COPY trailer_data in
> > > addition to their PRE_COPY initial_data and dirty_data
> > >
> > > There is a choice to make if we report the trailer_data during
> > > PRE_COPY or not. As this is all estimates, it doesn't matter unless
> > > the trailer_data is very big.
> > >
> > > Having all devices trend toward a 0 dirty_bytes to say they are are
> > > done all the pre-copy they can do makes sense from an API
> > > perspective. If one device trends toward 10MB due to a big
> > > trailer_data and one trends toward 0 bytes, how will qemu consistently
> > > decide when best to trigger STOP_COPY? It makes the API less useful.
> > >
> > > So, I would not include trailer_data in the dirty_bytes.
> >
> > That assumes that it's possible to keep up with the device dirty
> > rate.
> 
> It keeps options open so we have this choice someday.
> 
> We already see that implementations are using vCPU throttling as part
> of their migration strategy, and we are seriously looking at DMA
> throttling. It is not a big leap to imagine that
> internal-state-dirtying throttling will happne someday.
> 
> With throttling iterations would ratchet up the throttle until they
> reach an absolute small amount of dirty then cut over to STOP_COPY
> 
> > It seems like a better approach for userspace would be to look at how
> > dirty_bytes is trending.
> 
> It may be biw, but this approach doesn't care if the trailing_bytes
> are included or not, so lets leave them out and preserve the other
> operating model.
> 
> > If we exclude STOP_COPY trailing data from the VFIO_DEVICE_MIG_PRECOPY
> > ioctl, it seems even more of a disconnect that when we enter the
> > STOP_COPY state, suddenly we start getting new data out of a PRECOPY
> > ioctl.
> 
> Why? That amounts can go up at any time, how does it matter if it goes
> up after STOP_COPY or instantly before?
> 
> > BTW, "VFIO_DEVICE" should be reserved for ioctls and data structures
> > relative to the device FD, appending it with _MIG is too subtle for me.
> > This is also a GET operation for INFO, so I'd think for consistency
> > with the existing vfio uAPI we'd name this something like
> > VFIO_MIG_GET_PRECOPY_INFO where the structure might be named
> > vfio_precopy_info.
> 
> Sure
> 
> > So if we don't think this is the right approach for STOP_COPY, then why
> > are we pushing that it has any purpose outside of PRECOPY or might be
> > implemented by a non-PRECOPY driver for use in STOP_COPY?
> 
> It is just simpler and more consistent to implement the math under
> this ioctl in all cases then to try and artificially restrict it.
> 
> But I don't have a use case for it, so lets block it if you prefer.
> 
> Shameerali will you make these adjustments to the PRE_COPY patch?

Sure. I think we can summarize the discussion as below,

 - Rename the MIG_PRECOPY ioctl to VFIO_MIG_GET_PRECOPY_INFO and
  structure to vfio_precopy_info.
 - This ioctl is only valid in PRE_COPY state and should return -EINVAL in
  other states(Update the documentation).
 - No changes to the initial_bytes & dirty_bytes descriptions.

Please let me know if I missed anything.

I will address other comments on this series as well and sent out a
revised one soon.

Thanks,
Shameer    

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ