netdev - Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YXgV6ehhsSlydiEl@work-vm>
Date:   Tue, 26 Oct 2021 15:51:21 +0100
From:   "Dr. David Alan Gilbert" <dgilbert@...hat.com>
To:     Alex Williamson <alex.williamson@...hat.com>
Cc:     Yishai Hadas <yishaih@...dia.com>,
        Jason Gunthorpe <jgg@...dia.com>, bhelgaas@...gle.com,
        saeedm@...dia.com, linux-pci@...r.kernel.org, kvm@...r.kernel.org,
        netdev@...r.kernel.org, kuba@...nel.org, leonro@...dia.com,
        kwankhede@...dia.com, mgurtovoy@...dia.com, maorg@...dia.com,
        Cornelia Huck <cohuck@...hat.com>
Subject: Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver
 for mlx5 devices

* Alex Williamson (alex.williamson@...hat.com) wrote:
> On Mon, 25 Oct 2021 19:47:29 +0100
> "Dr. David Alan Gilbert" <dgilbert@...hat.com> wrote:
> 
> > * Alex Williamson (alex.williamson@...hat.com) wrote:
> > > On Mon, 25 Oct 2021 17:34:01 +0100
> > > "Dr. David Alan Gilbert" <dgilbert@...hat.com> wrote:
> > >   
> > > > * Alex Williamson (alex.williamson@...hat.com) wrote:  
> > > > > [Cc +dgilbert, +cohuck]
> > > > > 
> > > > > On Wed, 20 Oct 2021 11:28:04 +0300
> > > > > Yishai Hadas <yishaih@...dia.com> wrote:
> > > > >     

<snip>

> > > In a way.  We're essentially recognizing that we cannot stop a single
> > > device in isolation of others that might participate in peer-to-peer
> > > DMA with that device, so we need to make a pass to quiesce each device
> > > before we can ask the device to fully stop.  This new device state bit
> > > is meant to be that quiescent point, devices can accept incoming DMA
> > > but should cease to generate any.  Once all device are quiesced then we
> > > can safely stop them.  
> > 
> > It may need some further refinement; for example in that quiesed state
> > do counters still tick? will a NIC still respond to packets that don't
> > get forwarded to the host?
> 
> I'd think no, but I imagine it's largely device specific to what extent
> a device can be fully halted yet minimally handle incoming DMA.

That's what worries me; we're adding a new state here as we understand
more about trying to implement a device; but it seems that we need to
nail something down as to what the state means.

> > Note I still think you need a way to know when you have actually reached
> > these states; setting a bit in a register is asking nicely for a device
> > to go into a state - has it got there?
> 
> It's more than asking nicely, we define the device_state bits as
> synchronous, the device needs to enter the state before returning from
> the write operation or return an errno.

I don't see how it can be synchronous in practice; can it really wait to
complete if it has to take many cycles to finish off an inflight DMA
before it transitions?

> > > > Now, you could be a *little* more sloppy; you could allow a device carry
> > > > on doing stuff purely with it's own internal state up until the point
> > > > it needs to serialise; but that would have to be strictly internal state
> > > > only - if it can change any other devices state (or issue an interrupt,
> > > > change RAM etc) then you get into ordering issues on the serialisation
> > > > of multiple devices.  
> > > 
> > > Yep, that's the proposal that doesn't require a uAPI change, we loosen
> > > the definition of stopped to mean the device can no longer generate DMA
> > > or interrupts and all internal processing outside or responding to
> > > incoming DMA should halt (essentially the same as the new quiescent
> > > state above).  Once all devices are in this state, there should be no
> > > incoming DMA and we can safely collect per device migration data.  If
> > > state changes occur beyond the point in time where userspace has
> > > initiated the collection of migration data, drivers have options for
> > > generating errors when userspace consumes that data.  
> > 
> > How do you know that last device has actually gone into that state?
> 
> Each device cannot, the burden is on the user to make sure all devices
> are stopped before proceeding to read migration data.

Yeh this really ties to the previous question; if it's synchronous
you're OK.

> > Also be careful; it feels much more delicate where something might
> > accidentally start a transaction.
> 
> This sounds like a discussion of theoretically broken drivers.  Like
> the above device_state, drivers still have a synchronization point when
> the user reads the pending_bytes field to initiate retrieving the
> device state.  If the implementation requires the device to be fully
> stopped to snapshot the device state to provide to the user, this is
> where that would happen.  Thanks,

Yes, but I worry that some ways of definining it are harder to get right
in drivers, so less likely to be theoretical.

Dave

> Alex
> 
-- 
Dr. David Alan Gilbert / dgilbert@...hat.com / Manchester, UK