[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211020150709.7cff2066.alex.williamson@redhat.com>
Date: Wed, 20 Oct 2021 15:07:09 -0600
From: Alex Williamson <alex.williamson@...hat.com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Yishai Hadas <yishaih@...dia.com>, bhelgaas@...gle.com,
saeedm@...dia.com, linux-pci@...r.kernel.org, kvm@...r.kernel.org,
netdev@...r.kernel.org, kuba@...nel.org, leonro@...dia.com,
kwankhede@...dia.com, mgurtovoy@...dia.com, maorg@...dia.com,
"Dr. David Alan Gilbert" <dgilbert@...hat.com>,
Cornelia Huck <cohuck@...hat.com>
Subject: Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver
for mlx5 devices
On Wed, 20 Oct 2021 15:59:19 -0300
Jason Gunthorpe <jgg@...dia.com> wrote:
> On Wed, Oct 20, 2021 at 10:52:30AM -0600, Alex Williamson wrote:
>
> > I'm wondering if we're imposing extra requirements on the !_RUNNING
> > state that don't need to be there. For example, if we can assume that
> > all devices within a userspace context are !_RUNNING before any of the
> > devices begin to retrieve final state, then clearing of the _RUNNING
> > bit becomes the device quiesce point and the beginning of reading
> > device data is the point at which the device state is frozen and
> > serialized. No new states required and essentially works with a slight
> > rearrangement of the callbacks in this series. Why can't we do that?
>
> It sounds worth checking carefully. I didn't come up with a major
> counter scenario.
>
> We would need to specifically define which user action triggers the
> device to freeze and serialize. Reading pending_bytes I suppose?
The first read of pending_bytes after clearing the _RUNNING bit would
be the logical place to do this since that's what we define as the start
of the cycle for reading the device state.
"Freezing" the device is a valid implementation, but I don't think it's
strictly required per the uAPI. For instance there's no requirement
that pending_bytes is reduced by data_size on each iteratio; we
specifically only define that the state is complete when the user reads
a pending_bytes value of zero. So a driver could restart the device
state if the device continues to change (though it's debatable whether
triggering an -errno on the next migration region access might be a
more supportable approach to enforce that userspace has quiesced
external access).
> Since freeze is a device command we need to be able to report failure
> and to move the state to error, that feels bad hidden inside reading
> pending_bytes.
That seems like the wrong model. Reading pending_bytes can return
-errno should an error occur during freeze/serialize, but device_state
is never implicitly modified. Upon encountering an error reading
pending_bytes, userspace would abort the migration and move the
device_state to a new value. This is the point at which the driver
would return another -errno to indicate an unrecoverable internal
error, or honor the requested new state.
> > Maybe a clarification of the uAPI spec is sufficient to achieve this,
> > ex. !_RUNNING devices may still update their internal state machine
> > based on external access. Userspace is expected to quiesce all external
> > access prior to initiating the retrieval of the final device state from
> > the data section of the migration region. Failure to do so may result
> > in inconsistent device state or optionally the device driver may induce
> > a fault if a quiescent state is not maintained.
>
> But on the other hand this seem so subtle and tricky way to design a
> uAPI that devices and users are unlikely to implement it completely
> correctly.
I think it's only tricky if device_state is implicitly modified by
other actions, stopping all devices before collecting the state of any
of them seems like a reasonable requirement. It's the same requirement
we'd add with an NDMA bit.
> IMHO the point of the state is to control the current behavior of the
> device - yes we can control behavior on other operations, but it feels
> obfuscated.
It is, don't do that. Fail the migration region operation, fail the
next device state transition if the internal error is irrecoverable.
> Especially when the userspace that needs to know about this isn't even
> deployed yet, let's just do it cleanly?
That's an option, but again I'm wondering if we're imposing a
requirement on the !_RUNNING state that doesn't really exist and a
clean solution exists already. Thanks,
Alex
Powered by blists - more mailing lists