lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87o87isovr.fsf@redhat.com>
Date:   Thu, 21 Oct 2021 11:34:00 +0200
From:   Cornelia Huck <cohuck@...hat.com>
To:     Alex Williamson <alex.williamson@...hat.com>,
        Jason Gunthorpe <jgg@...dia.com>
Cc:     Yishai Hadas <yishaih@...dia.com>, bhelgaas@...gle.com,
        saeedm@...dia.com, linux-pci@...r.kernel.org, kvm@...r.kernel.org,
        netdev@...r.kernel.org, kuba@...nel.org, leonro@...dia.com,
        kwankhede@...dia.com, mgurtovoy@...dia.com, maorg@...dia.com,
        "Dr. David Alan Gilbert" <dgilbert@...hat.com>
Subject: Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver
 for mlx5 devices

On Wed, Oct 20 2021, Alex Williamson <alex.williamson@...hat.com> wrote:

> On Wed, 20 Oct 2021 15:59:19 -0300
> Jason Gunthorpe <jgg@...dia.com> wrote:
>
>> On Wed, Oct 20, 2021 at 10:52:30AM -0600, Alex Williamson wrote:
>> 
>> > I'm wondering if we're imposing extra requirements on the !_RUNNING
>> > state that don't need to be there.  For example, if we can assume that
>> > all devices within a userspace context are !_RUNNING before any of the
>> > devices begin to retrieve final state, then clearing of the _RUNNING
>> > bit becomes the device quiesce point and the beginning of reading
>> > device data is the point at which the device state is frozen and
>> > serialized.  No new states required and essentially works with a slight
>> > rearrangement of the callbacks in this series.  Why can't we do that?  
>> 
>> It sounds worth checking carefully. I didn't come up with a major
>> counter scenario.
>> 
>> We would need to specifically define which user action triggers the
>> device to freeze and serialize. Reading pending_bytes I suppose?
>
> The first read of pending_bytes after clearing the _RUNNING bit would
> be the logical place to do this since that's what we define as the start
> of the cycle for reading the device state.
>
> "Freezing" the device is a valid implementation, but I don't think it's
> strictly required per the uAPI.  For instance there's no requirement
> that pending_bytes is reduced by data_size on each iteratio; we
> specifically only define that the state is complete when the user reads
> a pending_bytes value of zero.  So a driver could restart the device
> state if the device continues to change (though it's debatable whether
> triggering an -errno on the next migration region access might be a
> more supportable approach to enforce that userspace has quiesced
> external access).

Hm, not so sure. From my reading of the uAPI, transitioning from
pre-copy to stop-and-copy (i.e. clearing _RUNNING) implies that we
freeze the device (at least, that's how I interpret "On state transition
from pre-copy to stop-and-copy, the driver must stop the device, save
the device state and send it to the user application through the
migration region.")

Maybe the uAPI is simply not yet clear enough.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ