lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211117174738.GL2105516@nvidia.com>
Date:   Wed, 17 Nov 2021 13:47:38 -0400
From:   Jason Gunthorpe <jgg@...dia.com>
To:     Cornelia Huck <cohuck@...hat.com>
Cc:     Yishai Hadas <yishaih@...dia.com>, alex.williamson@...hat.com,
        bhelgaas@...gle.com, saeedm@...dia.com, linux-pci@...r.kernel.org,
        kvm@...r.kernel.org, netdev@...r.kernel.org, kuba@...nel.org,
        leonro@...dia.com, kwankhede@...dia.com, mgurtovoy@...dia.com,
        maorg@...dia.com
Subject: Re: vfio migration discussions (was: [PATCH V2 mlx5-next 00/14] Add
 mlx5 live migration driver)

On Wed, Nov 17, 2021 at 05:42:58PM +0100, Cornelia Huck wrote:
> Ok, here's the contents (as of 2021-11-17 16:30 UTC) of the etherpad at
> https://etherpad.opendev.org/p/VFIOMigrationDiscussions -- in the hope
> of providing a better starting point for further discussion (I know that
> discussions are still ongoing in other parts of this thread; but
> frankly, I'm getting a headache trying to follow them, and I think it
> would be beneficial to concentrate on the fundamental questions
> first...)

In my mind several of these topics now have answers:

>       * Jason proposed a new NDMA (no-dma) state that seems to match the

NDMA solves the PRI problem too, and allows dirty tracking to be
iterative. So yes to adding to device_state vs implicit via !RUNNING

>     * No definition of what HW needs to preserve when RESUMING toggles
>     off - (eg today SET_IRQS must work, what else?).

Everything in the device controlled by other kernel subystems (IRQs,
MSI, PCI config space) must continue to work across !RUNNING and must
not be disturbed by the migration driver during RESUME.

So, clear yes that SET_IRQs during !RUNNING must be supported

>     * In general, what operations or accesses is the user restricted
>     from performing on the device while !RUNNING

Still a need on this other than the carve out for above. HNS won't
work without restrictions, for instance.

>     * PRI into the guest (guest user process SVA) has a sequencing
>     problem with RUNNING - can not migrate a vIOMMU in the middle of a
>     page fault, must stop and flush faults before stopping vCPUs

NDMA|RUNNING allows to suspend the vIOMMU

> The uAPI could benefit from some more detailed documentation
> (e.g. how to use it, what to do in edge cases, ...) outside of the
> header file.

We have an internal draft of this now

> Trying to use the mlx5 support currently on the list has unearthed
> some problems in QEMU <please summarize :)>

If the kernel does anything odd qemu does abort()

Performance is bad, Yishai sent a patch

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ