lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211020185919.GH2744544@nvidia.com>
Date:   Wed, 20 Oct 2021 15:59:19 -0300
From:   Jason Gunthorpe <jgg@...dia.com>
To:     Alex Williamson <alex.williamson@...hat.com>
Cc:     Yishai Hadas <yishaih@...dia.com>, bhelgaas@...gle.com,
        saeedm@...dia.com, linux-pci@...r.kernel.org, kvm@...r.kernel.org,
        netdev@...r.kernel.org, kuba@...nel.org, leonro@...dia.com,
        kwankhede@...dia.com, mgurtovoy@...dia.com, maorg@...dia.com,
        "Dr. David Alan Gilbert" <dgilbert@...hat.com>,
        Cornelia Huck <cohuck@...hat.com>
Subject: Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver
 for mlx5 devices

On Wed, Oct 20, 2021 at 10:52:30AM -0600, Alex Williamson wrote:

> I'm wondering if we're imposing extra requirements on the !_RUNNING
> state that don't need to be there.  For example, if we can assume that
> all devices within a userspace context are !_RUNNING before any of the
> devices begin to retrieve final state, then clearing of the _RUNNING
> bit becomes the device quiesce point and the beginning of reading
> device data is the point at which the device state is frozen and
> serialized.  No new states required and essentially works with a slight
> rearrangement of the callbacks in this series.  Why can't we do that?

It sounds worth checking carefully. I didn't come up with a major
counter scenario.

We would need to specifically define which user action triggers the
device to freeze and serialize. Reading pending_bytes I suppose?

Since freeze is a device command we need to be able to report failure
and to move the state to error, that feels bad hidden inside reading
pending_bytes.

> Maybe a clarification of the uAPI spec is sufficient to achieve this,
> ex. !_RUNNING devices may still update their internal state machine
> based on external access.  Userspace is expected to quiesce all external
> access prior to initiating the retrieval of the final device state from
> the data section of the migration region.  Failure to do so may result
> in inconsistent device state or optionally the device driver may induce
> a fault if a quiescent state is not maintained.

But on the other hand this seem so subtle and tricky way to design a
uAPI that devices and users are unlikely to implement it completely
correctly.

IMHO the point of the state is to control the current behavior of the
device - yes we can control behavior on other operations, but it feels
obfuscated.

Especially when the userspace that needs to know about this isn't even
deployed yet, let's just do it cleanly?

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ