lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 2 Feb 2022 14:04:15 -0400
From:   Jason Gunthorpe <jgg@...dia.com>
To:     Alex Williamson <alex.williamson@...hat.com>
Cc:     Shameerali Kolothum Thodi <shameerali.kolothum.thodi@...wei.com>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-crypto@...r.kernel.org" <linux-crypto@...r.kernel.org>,
        "mgurtovoy@...dia.com" <mgurtovoy@...dia.com>,
        Linuxarm <linuxarm@...wei.com>,
        liulongfang <liulongfang@...wei.com>,
        "Zengtao (B)" <prime.zeng@...ilicon.com>,
        yuzenghui <yuzenghui@...wei.com>,
        Jonathan Cameron <jonathan.cameron@...wei.com>,
        "Wangzhou (B)" <wangzhou1@...ilicon.com>
Subject: Re: [RFC v2 0/4] vfio/hisilicon: add acc live migration driver

On Wed, Feb 02, 2022 at 10:30:41AM -0700, Alex Williamson wrote:
> On Wed, 2 Feb 2022 14:34:52 +0000
> Shameerali Kolothum Thodi <shameerali.kolothum.thodi@...wei.com> wrote:
> 
> > > From: Jason Gunthorpe [mailto:jgg@...dia.com]
> > 
> > > 
> > >    I see pf_qm_state_pre_save() but didn't understand why it wanted to
> > >    send the first 32 bytes in the PRECOPY mode? It is fine, but it
> > >    will add some complexity to continue to do this.  
> > 
> > That was mainly to do a quick verification between src and dst compatibility
> > before we start saving the state. I think probably we can delay that check
> > for later.
> 
> In the v1 migration scheme, this was considered good practice.  It
> shouldn't be limited to PRECOPY, as there's no requirement to use
> PRECOPY, but the earlier in the migration process that we can trigger a
> device or data stream compatibility fault, the better.  TBH, even in
> the case where a device doesn't support live dirty tracking for a
> PRECOPY phase, using it for compatibility testing continues to seem
> like good practice.

At least with our thinking here, we'd rather the device expose an
explicit compatibility data via get/test system calls so we can build
proper infrastructure around this.

Every device will have compatibility requirements and we can build
more shared common code this way. ie qemu can ideally fetch the data
before migration starts and do an exchange with the live migration
target to see if it is OK. Orchestration can inventory the systems,
and automation can select live migration targets that can actually
work.

If it is hidden inside the migration stream it is too invisible to be
fully useful.

This is something we've been talking about here but don't have much
concrete to say for mlx5 yet.

The device still has to self-protect itself against a corrupted
migration stream impacting integrity, of course.

IIRC qemu has a nice spot to put this in the existing protocol.

Just overall, now that PRECOPY is optional, we should avoid using it
without a good reason. The driver implementation does have a cost.

Thanks,
Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ