lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BN9PR11MB527618B02E6BAF6EF03D2E0C8C3C9@BN9PR11MB5276.namprd11.prod.outlook.com>
Date:   Wed, 23 Feb 2022 02:02:07 +0000
From:   "Tian, Kevin" <kevin.tian@...el.com>
To:     Alex Williamson <alex.williamson@...hat.com>,
        Jason Gunthorpe <jgg@...dia.com>
CC:     Yishai Hadas <yishaih@...dia.com>,
        "bhelgaas@...gle.com" <bhelgaas@...gle.com>,
        "saeedm@...dia.com" <saeedm@...dia.com>,
        "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "kuba@...nel.org" <kuba@...nel.org>,
        "leonro@...dia.com" <leonro@...dia.com>,
        "kwankhede@...dia.com" <kwankhede@...dia.com>,
        "mgurtovoy@...dia.com" <mgurtovoy@...dia.com>,
        "maorg@...dia.com" <maorg@...dia.com>,
        "cohuck@...hat.com" <cohuck@...hat.com>,
        "Raj, Ashok" <ashok.raj@...el.com>,
        "shameerali.kolothum.thodi@...wei.com" 
        <shameerali.kolothum.thodi@...wei.com>
Subject: RE: [PATCH V8 mlx5-next 09/15] vfio: Define device migration protocol
 v2

> From: Alex Williamson <alex.williamson@...hat.com>
> Sent: Wednesday, February 23, 2022 9:10 AM
> > > > + * The kernel migration driver must fully transition the device to the
> new state
> > > > + * value before the operation returns to the user.
> > >
> > > The above statement certainly doesn't preclude asynchronous
> > > availability of data on the stream FD, but it does demand that the
> > > device state transition itself is synchronous and can cannot be
> > > shortcut.  If the state transition itself exceeds migration SLAs, we're
> > > in a pickle.  Thanks,
> >
> > Even if the commands were async, it is not easy to believe a device
> > can instantaneously abort an arc when a timer hits and return to full
> > operation. For instance, mlx5 can't do this.
> >
> > The vCPU cannot be restarted to try to meet the SLA until a command
> > going back to RUNNING returns.
> >
> > If we want to have a SLA feature it feels better to pass in the
> > deadline time as part of the set state ioctl and the driver can then
> > internally do something appropriate and not have to figure out how to
> > juggle an external abort. The driver would be expected to return fully
> > completed from STOP or return back to RUNNING before the deadline.
> >
> > For instance mlx5 could possibly implement this by checking the
> > migration size and doing some maths before deciding if it should
> > commit to its unabortable device command.
> >
> > I have a feeling supporting SLA means devices are going to have to
> > report latencies for various arcs and work in a more classical
> > realtime deadline oriented way overall. Estimating the transfer
> > latency and size is another factor too.
> >
> > Overall, this SLA topic looks quite big to me, and I think a full
> > solution will come with many facets. We are also quite interested in
> > dirty rate limiting, for instance.
> 
> So if/when we were to support this, we might use a different SET_STATE
> feature ioctl that allows the user to specify a deadline and we'd use
> feature probing or a flag on the migration feature for userspace to
> discover this?  I'd be ok with that, I just want to make sure we have
> agreeable options to support it.  Thanks,
> 

Or use a different device_feature ioctl to allow setting deadline 
for different arcs before changing device state and then reuse
existing SET_STATE semantics with the migration driver doing
estimation underlyingly based on pre-configured constraints...

Thanks
Kevin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ