lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <i4vq5amjds2xoqablojiwgplrp4uw7xvhk2esibytx25mesekc@q6ajjaexw2pu>
Date: Mon, 24 Nov 2025 22:44:06 +0100
From: Michał Winiarski <michal.winiarski@...el.com>
To: Alex Williamson <alex@...zbot.org>
CC: Jason Gunthorpe <jgg@...pe.ca>, Kevin Tian <kevin.tian@...el.com>, "Yishai
 Hadas" <yishaih@...dia.com>, Longfang Liu <liulongfang@...wei.com>, "Shameer
 Kolothum" <skolothumtho@...dia.com>, Brett Creeley <brett.creeley@....com>,
	Giovanni Cabiddu <giovanni.cabiddu@...el.com>, <kvm@...r.kernel.org>,
	<qat-linux@...el.com>, <virtualization@...ts.linux.dev>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/6] vfio: Introduce .migration_reset_state() callback

On Fri, Nov 21, 2025 at 11:16:03AM -0700, Alex Williamson wrote:
> On Thu, 20 Nov 2025 13:36:42 +0100
> Michał Winiarski <michal.winiarski@...el.com> wrote:
> 
> > Resetting the migration device state is typically delegated to PCI
> > .reset_done() callback.
> > With VFIO, reset is usually called under vdev->memory_lock, which causes
> > lockdep to report a following circular locking dependency scenario:
> > 
> > 0: set_device_state
> > driver->state_mutex -> migf->lock
> > 1: data_read
> > migf->lock -> mm->mmap_lock
> > 2: vfio_pin_dma
> > mm->mmap_lock -> vdev->memory_lock
> > 3: vfio_pci_ioctl_reset
> > vdev->memory_lock -> driver->state_mutex
> > 
> > Introduce a .migration_reset_state() callback called outside of
> > vdev->memory_lock to break the dependency chain.
> > 
> > Signed-off-by: Michał Winiarski <michal.winiarski@...el.com>
> > ---
> >  drivers/vfio/pci/vfio_pci_core.c | 25 ++++++++++++++++++++++---
> >  include/linux/vfio.h             |  4 ++++
> >  2 files changed, 26 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> > index 7dcf5439dedc9..d919636558ec8 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > @@ -553,6 +553,16 @@ int vfio_pci_core_enable(struct vfio_pci_core_device *vdev)
> >  }
> >  EXPORT_SYMBOL_GPL(vfio_pci_core_enable);
> >  
> > +static void vfio_pci_dev_migration_reset_state(struct vfio_pci_core_device *vdev)
> > +{
> > +	lockdep_assert_not_held(&vdev->memory_lock);
> > +
> > +	if (!vdev->vdev.mig_ops->migration_reset_state)
> 
> mig_ops itself is generally NULL.

Yeah, I clearly didn't test it with plain vfio_pci.
I'll fix it in v2.

> 
> > +		return;
> > +
> > +	vdev->vdev.mig_ops->migration_reset_state(&vdev->vdev);
> > +}
> > +
> >  void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
> >  {
> >  	struct pci_dev *pdev = vdev->pdev;
> > @@ -662,8 +672,10 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
> >  	 * overwrite the previously restored configuration information.
> >  	 */
> >  	if (vdev->reset_works && pci_dev_trylock(pdev)) {
> > -		if (!__pci_reset_function_locked(pdev))
> > +		if (!__pci_reset_function_locked(pdev)) {
> >  			vdev->needs_reset = false;
> > +			vfio_pci_dev_migration_reset_state(vdev);
> > +		}
> >  		pci_dev_unlock(pdev);
> >  	}
> >  
> > @@ -1230,6 +1242,8 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core_device *vdev,
> >  	ret = pci_try_reset_function(vdev->pdev);
> >  	up_write(&vdev->memory_lock);
> >  
> > +	vfio_pci_dev_migration_reset_state(vdev);
> > +
> >  	return ret;
> >  }
> >  
> > @@ -2129,6 +2143,7 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev)
> >  	if (vdev->vdev.mig_ops) {
> >  		if (!(vdev->vdev.mig_ops->migration_get_state &&
> >  		      vdev->vdev.mig_ops->migration_set_state &&
> > +		      vdev->vdev.mig_ops->migration_reset_state &&
> 
> For bisection purposes it would be better to enforce this after all the
> drivers are converted.

Makes sense. I'll split this patch.

> 
> >  		      vdev->vdev.mig_ops->migration_get_data_size) ||
> >  		    !(vdev->vdev.migration_flags & VFIO_MIGRATION_STOP_COPY))
> >  			return -EINVAL;
> > @@ -2486,8 +2501,10 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> >  
> >  err_undo:
> >  	list_for_each_entry_from_reverse(vdev, &dev_set->device_list,
> > -					 vdev.dev_set_list)
> > +					 vdev.dev_set_list) {
> >  		up_write(&vdev->memory_lock);
> > +		vfio_pci_dev_migration_reset_state(vdev);
> > +	}
> >  
> >  	list_for_each_entry(vdev, &dev_set->device_list, vdev.dev_set_list)
> >  		pm_runtime_put(&vdev->pdev->dev);
> > @@ -2543,8 +2560,10 @@ static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set)
> >  		reset_done = true;
> >  
> >  	list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list) {
> > -		if (reset_done)
> > +		if (reset_done) {
> >  			cur->needs_reset = false;
> > +			vfio_pci_dev_migration_reset_state(cur);
> > +		}
> 
> This and the core_disable path above are only called in the
> close/open-error path.  Do we really need this behavior there?  We
> might need separate reconciliation vs .reset_done for these.

Some drivers do the migration state reset explicitly in their, some rely
on reset (and .reset_done) being invoked.
We can remove it from here if we just call migration_reset_state
consistently in each driver.

> 
> As Kevin also noted, we're missing the non-ioctl reset paths.  This
> approach seems a bit error prone.  I wonder if instead we need a
> counterpart of vfio_pci_zap_and_down_write_memory_lock(), ie.
> vfio_pci_up_write_memory_lock_from_reset().  An equal mouthful, but
> scopes the problem to be more manageable at memory_lock release.

And it also needs to be called conditionally on certain paths.
Which means that it will take an extra argument (whether prior reset was
successful).
I'll try this approach in v2.

Thanks,
-Michał

> Thanks,
> 
> Alex
> 
> 
> >  
> >  		if (!disable_idle_d3)
> >  			pm_runtime_put(&cur->pdev->dev);
> > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > index eb563f538dee5..36aab2df40700 100644
> > --- a/include/linux/vfio.h
> > +++ b/include/linux/vfio.h
> > @@ -213,6 +213,9 @@ static inline bool vfio_device_cdev_opened(struct vfio_device *device)
> >   * @migration_get_state: Optional callback to get the migration state for
> >   *         devices that support migration. It's mandatory for
> >   *         VFIO_DEVICE_FEATURE_MIGRATION migration support.
> > + * @migration_reset_state: Optional callback to reset the migration state for
> > + *         devices that support migration. It's mandatory for
> > + *         VFIO_DEVICE_FEATURE_MIGRATION migration support.
> >   * @migration_get_data_size: Optional callback to get the estimated data
> >   *          length that will be required to complete stop copy. It's mandatory for
> >   *          VFIO_DEVICE_FEATURE_MIGRATION migration support.
> > @@ -223,6 +226,7 @@ struct vfio_migration_ops {
> >  		enum vfio_device_mig_state new_state);
> >  	int (*migration_get_state)(struct vfio_device *device,
> >  				   enum vfio_device_mig_state *curr_state);
> > +	void (*migration_reset_state)(struct vfio_device *device);
> >  	int (*migration_get_data_size)(struct vfio_device *device,
> >  				       unsigned long *stop_copy_length);
> >  };
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ