netdev - RE: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <BN9PR11MB5433435CAAAB23EAE085C3128C929@BN9PR11MB5433.namprd11.prod.outlook.com>
Date:   Tue, 9 Nov 2021 00:58:26 +0000
From:   "Tian, Kevin" <kevin.tian@...el.com>
To:     Jason Gunthorpe <jgg@...dia.com>
CC:     Alex Williamson <alex.williamson@...hat.com>,
        Cornelia Huck <cohuck@...hat.com>,
        Yishai Hadas <yishaih@...dia.com>,
        "bhelgaas@...gle.com" <bhelgaas@...gle.com>,
        "saeedm@...dia.com" <saeedm@...dia.com>,
        "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "kuba@...nel.org" <kuba@...nel.org>,
        "leonro@...dia.com" <leonro@...dia.com>,
        "kwankhede@...dia.com" <kwankhede@...dia.com>,
        "mgurtovoy@...dia.com" <mgurtovoy@...dia.com>,
        "maorg@...dia.com" <maorg@...dia.com>,
        "Dr. David Alan Gilbert" <dgilbert@...hat.com>
Subject: RE: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver
 for mlx5 devices

> From: Jason Gunthorpe <jgg@...dia.com>
> Sent: Monday, November 8, 2021 8:36 PM
> 
> On Mon, Nov 08, 2021 at 08:53:20AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@...dia.com>
> > > Sent: Tuesday, October 26, 2021 11:19 PM
> > >
> > > On Tue, Oct 26, 2021 at 08:42:12AM -0600, Alex Williamson wrote:
> > >
> > > > > This is also why I don't like it being so transparent as it is
> > > > > something userspace needs to care about - especially if the HW
> cannot
> > > > > support such a thing, if we intend to allow that.
> > > >
> > > > Userspace does need to care, but userspace's concern over this should
> > > > not be able to compromise the platform and therefore making VF
> > > > assignment more susceptible to fatal error conditions to comply with a
> > > > migration uAPI is troublesome for me.
> > >
> > > It is an interesting scenario.
> > >
> > > I think it points that we are not implementing this fully properly.
> > >
> > > The !RUNNING state should be like your reset efforts.
> > >
> > > All access to the MMIO memories from userspace should be revoked
> > > during !RUNNING
> >
> > This assumes that vCPUs must be stopped before !RUNNING is entered
> > in virtualization case. and it is true today.
> >
> > But it may not hold when talking about guest SVA and I/O page fault [1].
> > The problem is that the pending requests may trigger I/O page faults
> > on guest page tables. W/o running vCPUs to handle those faults, the
> > quiesce command cannot complete draining the pending requests
> > if the device doesn't support preempt-on-fault (at least it's the case for
> > some Intel and Huawei devices, possibly true for most initial SVA
> > implementations).
> 
> It cannot be ordered any other way.
> 
> vCPUs must be stopped first, then the PCI devices must be stopped
> after, otherwise the vCPU can touch a stopped a device while handling
> a fault which is unreasonable.
> 
> However, migrating a pending IOMMU fault does seem unreasonable as well.
> 
> The NDA state can potentially solve this:
> 
>   RUNNING | VCPU RUNNING - Normal
>   NDMA | RUNNING | VCPU RUNNING - Halt and flush DMA, and thus all
> faults
>   NDMA | RUNNING - Halt all MMIO access

should be two steps?

NDMA | RUNNING - vCPU stops access to the device
NDMA - halt all MMIO access by revoking mapping

>   0 - Halted everything

yes, adding a new state sounds better than reordering the vcpu/device
stop sequence.

> 
> Though this may be more disruptive to the vCPUs as they could spin on
> DMA/interrupts that will not come.

it's inevitable regardless how we define the migration states. the
actual impact depends on how long 'Halt and flush DMA' will take.

Thanks
Kevin