netdev - Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87o87isovr.fsf@redhat.com>
Date:   Thu, 21 Oct 2021 11:34:00 +0200
From:   Cornelia Huck <cohuck@...hat.com>
To:     Alex Williamson <alex.williamson@...hat.com>,
        Jason Gunthorpe <jgg@...dia.com>
Cc:     Yishai Hadas <yishaih@...dia.com>, bhelgaas@...gle.com,
        saeedm@...dia.com, linux-pci@...r.kernel.org, kvm@...r.kernel.org,
        netdev@...r.kernel.org, kuba@...nel.org, leonro@...dia.com,
        kwankhede@...dia.com, mgurtovoy@...dia.com, maorg@...dia.com,
        "Dr. David Alan Gilbert" <dgilbert@...hat.com>
Subject: Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver
 for mlx5 devices

On Wed, Oct 20 2021, Alex Williamson <alex.williamson@...hat.com> wrote:

> On Wed, 20 Oct 2021 15:59:19 -0300
> Jason Gunthorpe <jgg@...dia.com> wrote:
>
>> On Wed, Oct 20, 2021 at 10:52:30AM -0600, Alex Williamson wrote:
>> 
>> > I'm wondering if we're imposing extra requirements on the !_RUNNING
>> > state that don't need to be there.  For example, if we can assume that
>> > all devices within a userspace context are !_RUNNING before any of the
>> > devices begin to retrieve final state, then clearing of the _RUNNING
>> > bit becomes the device quiesce point and the beginning of reading
>> > device data is the point at which the device state is frozen and
>> > serialized.  No new states required and essentially works with a slight
>> > rearrangement of the callbacks in this series.  Why can't we do that?  
>> 
>> It sounds worth checking carefully. I didn't come up with a major
>> counter scenario.
>> 
>> We would need to specifically define which user action triggers the
>> device to freeze and serialize. Reading pending_bytes I suppose?
>
> The first read of pending_bytes after clearing the _RUNNING bit would
> be the logical place to do this since that's what we define as the start
> of the cycle for reading the device state.
>
> "Freezing" the device is a valid implementation, but I don't think it's
> strictly required per the uAPI.  For instance there's no requirement
> that pending_bytes is reduced by data_size on each iteratio; we
> specifically only define that the state is complete when the user reads
> a pending_bytes value of zero.  So a driver could restart the device
> state if the device continues to change (though it's debatable whether
> triggering an -errno on the next migration region access might be a
> more supportable approach to enforce that userspace has quiesced
> external access).

Hm, not so sure. From my reading of the uAPI, transitioning from
pre-copy to stop-and-copy (i.e. clearing _RUNNING) implies that we
freeze the device (at least, that's how I interpret "On state transition
from pre-copy to stop-and-copy, the driver must stop the device, save
the device state and send it to the user application through the
migration region.")

Maybe the uAPI is simply not yet clear enough.