lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 19 Oct 2021 16:10:25 -0300
From:   Jason Gunthorpe <jgg@...dia.com>
To:     Alex Williamson <alex.williamson@...hat.com>
Cc:     Yishai Hadas <yishaih@...dia.com>, bhelgaas@...gle.com,
        saeedm@...dia.com, linux-pci@...r.kernel.org, kvm@...r.kernel.org,
        netdev@...r.kernel.org, kuba@...nel.org, leonro@...dia.com,
        kwankhede@...dia.com, mgurtovoy@...dia.com, maorg@...dia.com
Subject: Re: [PATCH V2 mlx5-next 14/14] vfio/mlx5: Use its own PCI reset_done
 error handler

On Tue, Oct 19, 2021 at 12:55:13PM -0600, Alex Williamson wrote:

> > +static void mlx5vf_reset_work_handler(struct work_struct *work)
> > +{
> > +	struct mlx5vf_pci_core_device *mvdev =
> > +		container_of(work, struct mlx5vf_pci_core_device, work);
> > +
> > +	mutex_lock(&mvdev->state_mutex);
> > +	mlx5vf_reset_mig_state(mvdev);
> 
> I see this calls mlx5vf_reset_vhca_state() but how does that unfreeze
> and unquiesce the device as necessary to get back to _RUNNING?

FLR of the function does it.

Same flow as if userspace attaches the vfio migration, freezes the
device then closes the FD. The FLR puts everything in the device right
and the next open will see a functional, unfrozen, blank device.

> > +	mvdev->vmig.vfio_dev_state = VFIO_DEVICE_STATE_RUNNING;
> > +	mutex_unlock(&mvdev->state_mutex);
> > +}
> > +
> > +static void mlx5vf_pci_aer_reset_done(struct pci_dev *pdev)
> > +{
> > +	struct mlx5vf_pci_core_device *mvdev = dev_get_drvdata(&pdev->dev);
> > +
> > +	if (!mvdev->migrate_cap)
> > +		return;
> > +
> > +	schedule_work(&mvdev->work);
> 
> This seems troublesome, how long does userspace poll the device state
> after reset to get back to _RUNNING?  Seems we at least need a
> flush_work() call when userspace reads the device state.  Thanks,

The locking is very troubled here because the higher VFIO layers are
holding locks across reset and using those same locks with the mm_lock

The delay is a good point :(

The other algorithm that can rescue this is to defer the cleanup work
to the mutex unlock, which ever context happens to get to it:

reset_done:
   spin_lock(spin)
   defered_reset = true;
   if (!mutex_trylock(&state_mutex)) 
      spin_unlock(spin)
      return
   spin_unlock(spin)

   state_mutex_unlock()

state_mutex_unlock:
 again:
   spin_lock(spin)
   if (defered_reset)
      spin_unlock()
      do_post_reset;
      goto again;
   mutex_unlock(state_mutex);
   spin_unlock()

and call state_mutex_unlock() in all unlock cases.

It is a much more complicated algorithm than the work.

Yishai this should also have had a comment explaining why this is
needed as nobody is going to guess a ABBA deadlock on mm_lock is the
reason.

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ