lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220331022852.7f4d86a5.pasic@linux.ibm.com>
Date:   Thu, 31 Mar 2022 02:28:52 +0200
From:   Halil Pasic <pasic@...ux.ibm.com>
To:     Tony Krowiak <akrowiak@...ux.ibm.com>
Cc:     linux-s390@...r.kernel.org, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org, jjherne@...ux.ibm.com, freude@...ux.ibm.com,
        borntraeger@...ibm.com, cohuck@...hat.com, mjrosato@...ux.ibm.com,
        alex.williamson@...hat.com, kwankhede@...dia.com,
        fiuczy@...ux.ibm.com, Halil Pasic <pasic@...ux.ibm.com>
Subject: Re: [PATCH v18 17/18] s390/Docs: new doc describing lock usage by
 the vfio_ap device driver

On Mon, 14 Feb 2022 19:50:39 -0500
Tony Krowiak <akrowiak@...ux.ibm.com> wrote:

> Introduces a new document describing the locks used by the vfio_ap device
> driver and how to use them so as to avoid lockdep reports and deadlock
> situations.
> 
> Signed-off-by: Tony Krowiak <akrowiak@...ux.ibm.com>
> ---
>  Documentation/s390/vfio-ap-locking.rst | 389 +++++++++++++++++++++++++
>  1 file changed, 389 insertions(+)
>  create mode 100644 Documentation/s390/vfio-ap-locking.rst
> 
> diff --git a/Documentation/s390/vfio-ap-locking.rst b/Documentation/s390/vfio-ap-locking.rst
> new file mode 100644
> index 000000000000..10abbb6d6089
> --- /dev/null
> +++ b/Documentation/s390/vfio-ap-locking.rst
> @@ -0,0 +1,389 @@
> +======================
> +VFIO AP Locks Overview
> +======================
> +This document describes the locks that are pertinent to the secure operation
> +of the vfio_ap device driver. Throughout this document, the following variables
> +will be used to denote instances of the structures herein described:
> +
> +struct ap_matrix_dev *matrix_dev;
> +struct ap_matrix_mdev *matrix_mdev;
> +struct kvm *kvm;
> +
> +The Matrix Devices Lock (drivers/s390/crypto/vfio_ap_private.h)
> +--------------------------------------------------------------
> +
> +struct ap_matrix_dev {
> +	...
> +	struct list_head mdev_list;
> +	struct mutex mdevs_lock;
> +	...
> +}
> +
> +The Matrix Devices Lock (matrix_dev->mdevs_lock) is implemented as a global
> +mutex contained within the single instance of struct ap_matrix_dev. This lock

s/single instance of/singleton object of type/

> +controls access to all fields contained within each matrix_mdev instance under
> +the control of the vfio_ap device driver (matrix_dev->mdev_list).

Are there matrix_mdev instances not under the control of the vfio_ap
device driver?

(MARK 1)

> This lock must
> +be held while reading from, writing to or using the data from a field contained
> +within a matrix_mdev instance representing one of the vfio_ap device driver's
> +mediated devices.

This makes it look like for example struct vfio_ap_queue objects are out
of scope.

> +
> +The KVM Lock (include/linux/kvm_host.h)
> +---------------------------------------
> +
> +struct kvm {
> +	...
> +	struct mutex lock;
> +	...
> +}
> +
> +The KVM Lock (kvm->lock) controls access to the state data for a KVM guest. This
> +lock must be held by the vfio_ap device driver while one or more AP adapters,
> +domains or control domains are being plugged into or unplugged from the guest.
> +
> +The vfio_ap device driver registers a function to be notified when the pointer
> +to the kvm instance has been set. The KVM pointer is passed to the handler by
> +the notifier and is stored in the in the matrix_mdev instance
> +(matrix_mdev->kvm = kvm) containing the state of the mediated device that has
> +been passed through to the KVM guest.
> +
> +The Guests Lock (drivers/s390/crypto/vfio_ap_private.h)
> +-----------------------------------------------------------
> +
> +struct ap_matrix_dev {
> +	...
> +	struct list_head mdev_list;
> +	struct mutex guests_lock;
> +	...
> +}
> +
> +The Guests Lock (matrix_dev->guests_lock) controls access to the
> +matrix_mdev instances (matrix_dev->mdev_list) that represent mediated devices
> +that hold the state for the mediated devices that have been passed through to a

Didn't say that access to fields of matrix_mdev instances is controlled
by the matrix_dev->mdevs lock at (MARK 1)? How do the two statements
mesh?

> +KVM guest. This lock must be held:
> +
> +1. To control access to the KVM pointer (matrix_mdev->kvm) while the vfio_ap
> +   device driver is using it to plug/unplug AP devices passed through to the KVM
> +   guest.
> +
> +2. To add matrix_mdev instances to or remove them from matrix_dev->mdev_list.
> +   This is necessary to ensure the proper locking order when the list is perused
> +   to find an ap_matrix_mdev instance for the purpose of plugging/unplugging
> +   AP devices passed through to a KVM guest.
> +
> +   For example, when a queue device is removed from the vfio_ap device driver,
> +   if the adapter is passed through to a KVM guest, it will have to be
> +   unplugged. In order to figure out whether the adapter is passed through,
> +   the matrix_mdev object to which the queue is assigned will have to be
> +   found. The KVM pointer (matrix_mdev->kvm) can then be used to determine if
> +   the mediated device is passed through (matrix_mdev->kvm != NULL) and if so,
> +   to unplug the adapter.
> +
> +It is not necessary to take the Guests Lock to access the KVM pointer if the
> +pointer is not used to plug/unplug devices passed through to the KVM guest;
> +however, in this case, the Matrix Devices Lock (matrix_dev->mdevs_lock) must be
> +held in order to access the KVM pointer since it set and cleared under the
> +protection of the Matrix Devices Lock. A case in point is the function that
> +handles interception of the PQAP(AQIC) instruction sub-function. This handler
> +needs to access the KVM pointer only for the purposes of setting or clearing IRQ
> +resources, so only the matrix_dev->mdevs_lock needs to be held.
> +

It is very unclear what this lock is actually protecting, and when does
it need to be taken.

> +The PQAP Hook Lock (arch/s390/include/asm/kvm_host.h)
> +-----------------------------------------------------
> +
> +typedef int (*crypto_hook)(struct kvm_vcpu *vcpu);
> +
> +struct kvm_s390_crypto {
> +	...
> +	struct rw_semaphore pqap_hook_rwsem;
> +	crypto_hook *pqap_hook;
> +	...
> +};
> +
> +The PQAP Hook Lock is a r/w semaphore that controls access to the function
> +pointer of the handler (*kvm->arch.crypto.pqap_hook) to invoke when the
> +PQAP(AQIC) instruction sub-function is intercepted by the host. The lock must be
> +held in write mode when pqap_hook value is set, and in read mode when the
> +pqap_hook function is called.
> +
> +Locking Order
> +-------------
> +
> +If the various locks are not taken in the proper order, it could potentially
> +result in a lockdep splat.

Just in a lockdep splat or in a deadlock?

> The proper order for taking locks depends upon
> +the operation taking place,

That sounds very fishy! The whole point of having a locking order is
preventing deadlocks if everybody sticks to the locking order. If there
are exceptions, i.e. if we violate the locking order, we risk deadlocks.

> but in general the Guests Lock
> +(matrix_dev->guests_lock) must be taken outside of the KVM Lock (kvm->lock)
> +which in turn must be taken outside of the Matrix Devices Lock
> +(matrix_dev->mdevs_lock).
> +
> +The following describes the various operations under which the various locks are
> +taken, the purpose for taking them and the order in which they must be taken.

It looks a little odd that you describe all the operations here,
especially in prosa.

But hey the good thing is, none seem to violate the lock order.


> +
> +* Operations: Setting or clearing the KVM pointer (matrix_mdev->kvm):
> +
> +  1. PQAP Hook Lock (kvm->arch.crypto.pqap_hook_rwsem):
> +
> +	This semaphore must be held in write mode while setting or clearing the
> +	reference to the function pointer (kvm->arch.crypt.pqap_hook) to call
> +	when the PQAP(AQIC) instruction sub-function is intercepted by the host.
> +	The function pointer is set when the KVM pointer is being set and
> +	cleared when the KVM pointer is being cleared.
> +
> +  2.Guests Lock (matrix_dev->guests_lock):
> +
> +	This mutex must be held while accessing the KVM pointer
> +	(matrix_mdev->kvm) to plug/unplug AP devices passed through to the
> +	KVM guest
> +
> +  3. KVM Lock (kvm->lock):
> +
> +	This mutex must be held while the AP devices passed through to the KVM
> +	guest are plugged/unplugged.
> +
> +  4. Matrix Devices Lock (matrix_dev->mdevs_lock)
> +
> +	This lock must be held to prevent access to the matrix_mdev state
> +	while writing/reading state values during the operation.
> +
> +* Operations: Assign or unassign an adapter, domain or control domain of a
> +	      mediated device under the control of the vfio_ap device driver:
> +
> +  1. Guests Lock (matrix_dev->guests_lock):
> +
> +	This mutex must be held while accessing the KVM pointer
> +	(matrix_dev->kvm) to plug/unplug AP devices passed through to the
> +	KVM guest as a result of the assignment/unassignment operation.
> +	Assignment of an AP device may result in additional queue devices
> +	or control domains being plugged into the guest. Similarly, unassignment
> +	may result in unplugging queue devices or control domains from the
> +	guest
> +
> +  3. KVM Lock (matrix_mdev->kvm->lock):
> +
> +	This mutex must be held while the AP devices passed through to the KVM
> +	guest are plugged in or unplugged.
> +
> +  4. Matrix Devices Lock (matrix_dev->mdevs_lock)
> +
> +	This lock must be held to prevent access to the matrix_mdev state
> +	while writing/reading state values during the operation. For example, to
> +	determine which AP devices need to be plugged/unplugged, the lock
> +	must be held to prevent other operations from changing the data used
> +	to construct the guest's AP configuration.
> +
> +* Operations: Probe or remove an AP queue device:
> +
> +  When a queue device is bound to the vfio_ap device driver, the driver's probe
> +  callback is invoked. Similarly, when a queue device is unbound from the
> +  driver it's remove callback is invoked. The probe and remove functions will
> +  take locks in the following order:
> +
> +  1. Guests Lock (matrix_dev->guests_lock):
> +
> +	This mutex must be held for the duration of this operation.
> +
> +	At the time of the operation, the vfio_ap device driver will only have
> +	the APQN of the queue being probed or removed, so the
> +	matrix_dev->mdevs_list must be perused to locate the matrix_mdev
> +	instance to which the queue is assigned. The Guests Lock must be held
> +	during this time to prevent the list from being changed while processing
> +	the probe/remove.
> +
> +	Once the matrix_mdev is found, the operation must determine whether the
> +	mediated device is passed through to a guest (matrix_mdev->kvm != NULL),
> +	then use the KVM pointer to perform the plug/unplug operation. Here
> +	again, the lock must be held to prevent other operations from accessing
> +	the KVM pointer for the same purpose.
> +
> +  2. KVM Lock (kvm->lock):
> +
> +	This mutex must be held while the AP devices passed through to the KVM
> +	guest are plugged in or unplugged to prevent other operations from
> +	accessing the guest's state while it is in flux.
> +
> +  3. Matrix Devices Lock (matrix_dev->mdevs_lock)
> +
> +	This lock must be held to prevent access to the matrix_mdev state
> +	while writing/reading state values during the operation, such as the
> +	masks used to construct the KVM guest's AP configuration.
> +
> +* Operations: Probe or remove a mediated device:
> +
> +  1. Guests Lock (matrix_dev->guests_lock):
> +
> +	This mutex must be held while adding the matrix_mdev to the
> +	matrix_dev->mdev_list during the probe operation or when removing it
> +	from the list during the remove operation. This is to prevent access by
> +	other functions that must traverse the list to find a matrix_mdev for
> +	the purpose of plugging/unplugging AP devices passed through to a KVM
> +	guest (i.e., probe/remove queue callbacks), while the list is being
> +	modified.
> +
> +  2. Matrix Devices Lock (matrix_dev->mdevs_lock)
> +
> +	This lock must be held to prevent access to the matrix_mdev state
> +	while writing/reading state values during the probe or remove operations
> +	such as initializing the hashtable of queue devices
> +	(matrix_mdev->qtable.queues) assigned to the matrix_mdev.
> +
> +* Operation: Handle interception of the PQAP(AQIC) instruction sub-function:
> +
> +  1. PQAP Hook Lock (kvm->arch.crypto.pqap_hook_rwsem)
> +
> +	This semaphore must be held in read mode while retrieving the function
> +	pointer (kvm->arch.crypto.pqap_hook) and executing the function that
> +	handles the interception of the PQAP(AQIC) instruction sub-function by
> +	the host.
> +
> +  2. Matrix Devices Lock (matrix_dev->mdevs_lock)
> +
> +	This lock must be held to prevent access to the matrix_mdev state
> +	while writing/reading state values during the execution of the
> +	PQAP(AQIC) instruction sub-function interception handler. For example,
> +	the handler must iterate over the matrix_mdev->qtable.queues hashtable
> +	to find the vfio_ap_queue object representing the queue for which
> +	interrupts are being enabled or disabled.
> +
> +  Note: It is not necessary to take the Guests Lock (matrix_dev->guests_lock)
> +	or the KVM Lock (matrix_mdev->kvm->lock) because the KVM pointer
> +	will not be accessed to plug/unplug AP devices passed through to the
> +	guest; it will only be used to allocate or free resources for processing
> +	interrupts.
> +
> +* Operation: Handle AP configuration changed notification:
> +
> +  The vfio_ap device driver registers a callback function to be notified when
> +  the AP bus detects that the host's AP configuration has changed. This can
> +  occur due to the addition or removal of AP adapters, domains or control
> +  domains via an SE or HMC connected to a DPM enabled LPAR. The objective of the
> +  handler is to remove the queues no longer accessible via the host in bulk
> +  rather than one queue at a time via the driver's queue device remove callback.
> +  The locks and the order in which they must be taken by this operation are:
> +
> + 1. Guests Lock (matrix_dev->guests_lock):
> +
> +	This mutex must be held for the duration of the operation to:
> +
> +	* Iterate over the matrix_dev->mdev_list to find each matrix_mdev from
> +	  which a queue device to be removed is assigned and prevent other
> +	  operations from modifying the list while processing the affected
> +	  matrix_mdev instances.
> +
> +	* Prevent other operations from acquiring access to the KVM pointer in
> +	  each affected matrix_mdev instance (matrix_mdev->kvm) for the purpose
> +	  of plugging/unplugging AP devices passed through to the KVM guest via
> +	  that instance.
> +
> +2. KVM Lock (kvm->lock):
> +
> +	This mutex must be held for each affected matrix_mdev instance while
> +	the AP devices passed through to the KVM guest are unplugged to prevent
> +	other operations from accessing the guest's state while it is in flux.
> +
> +	Note: This lock must be re-acquired for each matrix_mdev instance.
> +
> +  3. Matrix Devices Lock (matrix_dev->mdevs_lock)
> +
> +	This lock must be held for each affected matrix_mdev to prevent access
> +	to the matrix_mdev state while writing/reading state values during the
> +	operation, such as the masks used to construct the KVM guest's AP
> +	configuration.
> +
> +	Note: This lock must be re-acquired for each matrix_mdev instance.
> +
> +Operation: Handle AP bus scan complete notification:
> +
> +  The vfio_ap device driver registers a callback function to be notified when
> +  the AP bus scan completes after detecting the addition or removal of AP
> +  adapters, domains or control domains. The objective of the handler is t
> +  add the new queues accessible via the host in bulk rather than one queue
> +  at a time via the driver's queue device probe callback. The locks and the
> +  order in which they must be taken by this operation are:
> +
> +  1. Guests Lock (matrix_dev->guests_lock):
> +
> +	This mutex must be held for the duration of the operation to:
> +
> +	* Iterate over the matrix_dev->mdev_list to find each matrix_mdev to
> +	  which a queue device added is assigned and prevent other operations
> +	  from modifying the list while processing each affected matrix_mdev
> +	  object.
> +
> +	* Prevent other operations from acquiring access to the KVM pointer in
> +	  each affected matrix_mdev instance (matrix_mdev->kvm) for the purpose
> +	  of plugging/unplugging AP devices passed through to the KVM guest via
> +	  that instance.
> +
> +  2. KVM Lock (kvm->lock):
> +
> +	This mutex must be held for each affected matrix_mdev instance while
> +	the AP devices passed through to the KVM guest are plugged in to prevent
> +	other operations from accessing the guest's state while it is in flux.
> +
> +	Note: This lock must be re-acquired for each matrix_mdev instance.
> +
> +  3. Matrix Devices Lock (matrix_dev->mdevs_lock):
> +
> +	This lock must be held for each affected matrix_mdev to prevent access
> +	to the matrix_mdev state while writing/reading state values during the
> +	operation, such as the masks used to construct the KVM guest's AP
> +	configuration.
> +
> +	Note: This lock must be re-acquired for each matrix_mdev instance.
> +
> +Operation: Handle resource in use query:
> +
> +  The vfio_ap device driver registers a callback function with the AP bus to be
> +  called when changes to the bus's sysfs /sys/bus/ap/apmask or
> +  /sys/bus/ap/aqmask attributes would result in one or more AP queue devices
> +  getting unbound from the vfio_ap device driver to verify none of them are in
> +  use by the driver (i.e., assigned to a matrix_mdev instance). If this function
> +  is called while an adapter or domain is also being assigned to a mediated
> +  device, this could result in a deadlock; for example:
> +
> +  1. A system administrator assigns an adapter to a mediated device under the
> +     control of the vfio_ap device driver. The driver will need to first take
> +     the matrix_dev->guests_lock to potentially hot plug the adapter into
> +     the KVM guest.
> +  2. At the same time, a system administrator sets a bit in the sysfs
> +     /sys/bus/ap/ap_mask attribute. To complete the operation, the AP bus
> +     must:
> +     a. Take the ap_perms_mutex lock to update the object storing the values
> +        for the /sys/bus/ap/ap_mask attribute.
> +     b. Call the vfio_ap device driver's in-use callback to verify that no
> +        queues now being reserved for the default zcrypt drivers are
> +        in use by the vfio_ap device driver. To do the verification, the in-use
> +        callback function takes the matrix_dev->guests_lock, but has to wait
> +        because it is already held by the operation in 1 above.
> +  3. The vfio_ap device driver calls an AP bus function to verify that the
> +     new queues resulting from the assignment of the adapter in step 1 are
> +     not reserved for the default zcrypt device driver. This AP bus function
> +     tries to take the ap_perms_mutex lock but gets stuck waiting for the
> +     it due to step 2a above.
> +
> +    Consequently, we have the following deadlock situation:
> +
> +    matrix_dev->guests_lock locked (1)
> +    ap_perms_mutex lock locked (2a)
> +    Waiting for matrix_dev->lock (2b) which is currently held (1)
> +    Waiting for ap_perms_mutex lock (3) which is currently held (2a)
> +
> +  To prevent the deadlock scenario, the in_use operation will take the
> +  required locks using the mutex_trylock() function and if the lock can not be
> +  acquired will terminate and return -EBUSY to indicate the driver is busy
> +  processing another request.
> +
> +  The locks required to respond to the handle resource in use query and the
> +  order in which they must be taken are:
> +
> +  1. Guests Lock (matrix_dev->guests_lock):
> +
> +  This mutex must be held for the duration of the operation to iterate over the
> +  matrix_dev->mdev_list to determine whether any of the queues to be unbound
> +  are assigned to a matrix_mdev instance.
> +
> +  2. Matrix Devices Lock (matrix_dev->mdevs_lock):
> +
> +  This mutex must be held for the duration of the operation to ensure that the
> +  AP configuration of each matrix_mdev instance does not change while verifying
> +  that none of the queue devices to be removed from the vfio_ap driver are
> +  assigned to it.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ