lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220201185321.GM1786498@nvidia.com>
Date:   Tue, 1 Feb 2022 14:53:21 -0400
From:   Jason Gunthorpe <jgg@...dia.com>
To:     Alex Williamson <alex.williamson@...hat.com>
Cc:     Yishai Hadas <yishaih@...dia.com>, bhelgaas@...gle.com,
        saeedm@...dia.com, linux-pci@...r.kernel.org, kvm@...r.kernel.org,
        netdev@...r.kernel.org, kuba@...nel.org, leonro@...dia.com,
        kwankhede@...dia.com, mgurtovoy@...dia.com, maorg@...dia.com
Subject: Re: [PATCH V6 mlx5-next 09/15] vfio: Extend the device migration
 protocol with RUNNING_P2P

On Tue, Feb 01, 2022 at 11:31:44AM -0700, Alex Williamson wrote:
> > +	bool have_p2p = device->migration_flags & VFIO_MIGRATION_P2P;
> > +
> >  	if (cur_fsm >= ARRAY_SIZE(vfio_from_fsm_table) ||
> >  	    new_fsm >= ARRAY_SIZE(vfio_from_fsm_table))
> >  		return VFIO_DEVICE_STATE_ERROR;
> >  
> > -	return vfio_from_fsm_table[cur_fsm][new_fsm];
> > +	if (!have_p2p && (new_fsm == VFIO_DEVICE_STATE_RUNNING_P2P ||
> > +			  cur_fsm == VFIO_DEVICE_STATE_RUNNING_P2P))
> > +		return VFIO_DEVICE_STATE_ERROR;
> 
> new_fsm is provided by the user, we pass set_state.device_state
> directly to .migration_set_state.  We should do bounds checking and
> compatibility testing on the end state in the core so that we can

This is the core :)

> return an appropriate -EINVAL and -ENOSUPP respectively, otherwise
> we're giving userspace a path to put the device into ERROR state, which
> we claim is not allowed.

Userspace can never put the device into error. As the function comment
says VFIO_DEVICE_STATE_ERROR is returned to indicate the arc is not
permitted. The driver is required to reflect that back as an errno
like mlx5 shows:

+		next_state = vfio_mig_get_next_state(vdev, mvdev->mig_state,
+						     new_state);
+		if (next_state == VFIO_DEVICE_STATE_ERROR) {
+			res = ERR_PTR(-EINVAL);
+			break;
+		}

We never get the driver into error, userspaces gets an EINVAL and no
change to the device state.

It is organized this way because the driver controls the locking for
its current state and thus the core code caller along the ioctl path
cannot validate the arc before passing it to the driver. The code is
shared by having the driver callback to the core to validate the
entire fsm arc under its lock.

The driver ends up with a small while loop that will probably be copy
and pasted to each driver. As I said, I'm interested to lift this up
as well but I need to better understand the locking needs of the other
driver implementations first, or we need your patch series to use the
inode for zap to land to eliminate the complicated locking in the
first place..

> Testing cur_fsm is more an internal consistency check, maybe those
> should be WARN_ON.

Sure
 
> > +
> > +	cur_fsm = vfio_from_fsm_table[cur_fsm][new_fsm];
> > +	if (!have_p2p) {
> > +		while (cur_fsm == VFIO_DEVICE_STATE_RUNNING_P2P)
> > +			cur_fsm = vfio_from_fsm_table[cur_fsm][new_fsm];
> > +	}
> 
> Perhaps this could be generalized with something like:

Oh, that table could probably do both tests, if the bit isn't set it
is an invalid cur/next_fsm as well..

Thanks,
Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ