lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 10 Feb 2022 09:48:11 -0700
From:   Alex Williamson <alex.williamson@...hat.com>
To:     Jason Gunthorpe <jgg@...dia.com>
Cc:     Yishai Hadas <yishaih@...dia.com>, bhelgaas@...gle.com,
        saeedm@...dia.com, linux-pci@...r.kernel.org, kvm@...r.kernel.org,
        netdev@...r.kernel.org, kuba@...nel.org, leonro@...dia.com,
        kwankhede@...dia.com, mgurtovoy@...dia.com, maorg@...dia.com,
        ashok.raj@...el.com, kevin.tian@...el.com,
        shameerali.kolothum.thodi@...wei.com
Subject: Re: [PATCH V7 mlx5-next 14/15] vfio/mlx5: Use its own PCI
 reset_done error handler

On Tue, 8 Feb 2022 22:39:18 -0400
Jason Gunthorpe <jgg@...dia.com> wrote:

> On Tue, Feb 08, 2022 at 05:08:01PM -0700, Alex Williamson wrote:
> > > @@ -477,10 +499,34 @@ static int mlx5vf_pci_get_device_state(struct vfio_device *vdev,
> > >  
> > >  	mutex_lock(&mvdev->state_mutex);
> > >  	*curr_state = mvdev->mig_state;
> > > -	mutex_unlock(&mvdev->state_mutex);
> > > +	mlx5vf_state_mutex_unlock(mvdev);
> > >  	return 0;  
> > 
> > I still can't see why it wouldn't be a both fairly trivial to implement
> > and a usability improvement if the unlock wrapper returned -EAGAIN on a
> > deferred reset so we could avoid returning a stale state to the user
> > and a dead fd in the former case.  Thanks,  
> 
> It simply is not useful - again, we always resolve this race that
> should never happen as though the two events happened consecutively,
> which is what would normally happen if we could use a simple mutex. We
> do not need to add any more complexity to deal with this already
> troublesome thing..

So walk me through how this works with QEMU, it's easy to hand-wave
userspace race and move on, but device reset can be triggered by guest
behavior while migration is supposed to be transparent to the guest.
These are essentially asynchronous threads where we're imposing a
synchronization point or lots of double checking in userspace whether
the device actually entered the state we think it did and if the
returned FD is usable.

Specifically, I suspect we can trigger this race if the VM reboots as
we're initiating a migration in the STOP_COPY phase, but that's maybe
less interesting if we expect the VM to be halted before the device
state is stepped.  More interesting might be how a PRE_COPY transition
works relative to asynchronous VM resets triggering device resets.  Are
we serializing all access to reset vs this DEVICE_FEATURE op or are we
resorting to double checking the device state, and how do we plan to
re-initiate migration states if a VM reset occurs during migration?
Thanks,

Alex

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ