lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon,  4 Apr 2022 18:10:38 -0400
From:   Tony Krowiak <akrowiak@...ux.ibm.com>
To:     linux-s390@...r.kernel.org, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org
Cc:     jjherne@...ux.ibm.com, freude@...ux.ibm.com,
        borntraeger@...ibm.com, cohuck@...hat.com, mjrosato@...ux.ibm.com,
        pasic@...ux.ibm.com, alex.williamson@...hat.com,
        kwankhede@...dia.com, fiuczy@...ux.ibm.com
Subject: [PATCH v19 19/20] s390/Docs: new doc describing lock usage by the vfio_ap device driver

Introduces a new document describing the locks used by the vfio_ap device
driver and how to use them so as to avoid lockdep reports and deadlock
situations.

Signed-off-by: Tony Krowiak <akrowiak@...ux.ibm.com>
---
 Documentation/s390/vfio-ap-locking.rst | 389 +++++++++++++++++++++++++
 1 file changed, 389 insertions(+)
 create mode 100644 Documentation/s390/vfio-ap-locking.rst

diff --git a/Documentation/s390/vfio-ap-locking.rst b/Documentation/s390/vfio-ap-locking.rst
new file mode 100644
index 000000000000..ba5db6689f14
--- /dev/null
+++ b/Documentation/s390/vfio-ap-locking.rst
@@ -0,0 +1,389 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================
+VFIO AP Locks Overview
+======================
+This document describes the locks that are pertinent to the secure operation
+of the vfio_ap device driver. Throughout this document, the following variables
+will be used to denote instances of the structures herein described:
+
+struct ap_matrix_dev *matrix_dev;
+struct ap_matrix_mdev *matrix_mdev;
+struct kvm *kvm;
+
+The Matrix Devices Lock (drivers/s390/crypto/vfio_ap_private.h)
+--------------------------------------------------------------
+
+struct ap_matrix_dev {
+	...
+	struct list_head mdev_list;
+	struct mutex mdevs_lock;
+	...
+}
+
+The Matrix Devices Lock (matrix_dev->mdevs_lock) is implemented as a global
+mutex contained within the single object of struct ap_matrix_dev. This lock
+controls access to all fields contained within each matrix_mdev
+(matrix_dev->mdev_list). This lock must be held while reading from, writing to
+or using the data from a field contained within a matrix_mdev instance
+representing one of the vfio_ap device driver's mediated devices.
+
+The KVM Lock (include/linux/kvm_host.h)
+---------------------------------------
+
+struct kvm {
+	...
+	struct mutex lock;
+	...
+}
+
+The KVM Lock (kvm->lock) controls access to the state data for a KVM guest. This
+lock must be held by the vfio_ap device driver while one or more AP adapters,
+domains or control domains are being plugged into or unplugged from the guest.
+
+The vfio_ap device driver registers a function to be notified when the pointer
+to the kvm instance has been set. The KVM pointer is passed to the handler by
+the notifier and is stored in the in the matrix_mdev instance
+(matrix_mdev->kvm = kvm) containing the state of the mediated device that has
+been passed through to the KVM guest.
+
+The Guests Lock (drivers/s390/crypto/vfio_ap_private.h)
+-----------------------------------------------------------
+
+struct ap_matrix_dev {
+	...
+	struct list_head mdev_list;
+	struct mutex guests_lock;
+	...
+}
+
+The Guests Lock (matrix_dev->guests_lock) controls access to the
+matrix_mdev instances (matrix_dev->mdev_list) that represent mediated devices
+that hold the state for the mediated devices that have been passed through to a
+KVM guest. This lock must be held:
+
+1. To control access to the KVM pointer (matrix_mdev->kvm) while the vfio_ap
+   device driver is using it to plug/unplug AP devices passed through to the KVM
+   guest.
+
+2. To add matrix_mdev instances to or remove them from matrix_dev->mdev_list.
+   This is necessary to ensure the proper locking order when the list is perused
+   to find an ap_matrix_mdev instance for the purpose of plugging/unplugging
+   AP devices passed through to a KVM guest.
+
+   For example, when a queue device is removed from the vfio_ap device driver,
+   if the adapter is passed through to a KVM guest, it will have to be
+   unplugged. In order to figure out whether the adapter is passed through,
+   the matrix_mdev object to which the queue is assigned will have to be
+   found. The KVM pointer (matrix_mdev->kvm) can then be used to determine if
+   the mediated device is passed through (matrix_mdev->kvm != NULL) and if so,
+   to unplug the adapter.
+
+It is not necessary to take the Guests Lock to access the KVM pointer if the
+pointer is not used to plug/unplug devices passed through to the KVM guest;
+however, in this case, the Matrix Devices Lock (matrix_dev->mdevs_lock) must be
+held in order to access the KVM pointer since it set and cleared under the
+protection of the Matrix Devices Lock. A case in point is the function that
+handles interception of the PQAP(AQIC) instruction sub-function. This handler
+needs to access the KVM pointer only for the purposes of setting or clearing IRQ
+resources, so only the matrix_dev->mdevs_lock needs to be held.
+
+The PQAP Hook Lock (arch/s390/include/asm/kvm_host.h)
+-----------------------------------------------------
+
+typedef int (*crypto_hook)(struct kvm_vcpu *vcpu);
+
+struct kvm_s390_crypto {
+	...
+	struct rw_semaphore pqap_hook_rwsem;
+	crypto_hook *pqap_hook;
+	...
+};
+
+The PQAP Hook Lock is a r/w semaphore that controls access to the function
+pointer of the handler (*kvm->arch.crypto.pqap_hook) to invoke when the
+PQAP(AQIC) instruction sub-function is intercepted by the host. The lock must be
+held in write mode when pqap_hook value is set, and in read mode when the
+pqap_hook function is called.
+
+Locking Order
+-------------
+
+If the various locks are not taken in the proper order, it could potentially
+result in a deadlock or lockdep splat. In general the Guests Lock
+(matrix_dev->guests_lock) must be taken outside of the KVM Lock (kvm->lock)
+which in turn must be taken outside of the Matrix Devices Lock
+(matrix_dev->mdevs_lock).
+
+The following describes the various operations under which the various locks are
+taken, the purpose for taking them and the order in which they must be taken.
+
+* Operations: Setting or clearing the KVM pointer (matrix_mdev->kvm):
+
+  1. PQAP Hook Lock (kvm->arch.crypto.pqap_hook_rwsem):
+
+	This semaphore must be held in write mode while setting or clearing the
+	reference to the function pointer (kvm->arch.crypt.pqap_hook) to call
+	when the PQAP(AQIC) instruction sub-function is intercepted by the host.
+	The function pointer is set when the KVM pointer is being set and
+	cleared when the KVM pointer is being cleared.
+
+  2.Guests Lock (matrix_dev->guests_lock):
+
+	This mutex must be held while accessing the KVM pointer
+	(matrix_mdev->kvm) to plug/unplug AP devices passed through to the
+	KVM guest
+
+  3. KVM Lock (kvm->lock):
+
+	This mutex must be held while the AP devices passed through to the KVM
+	guest are plugged/unplugged.
+
+  4. Matrix Devices Lock (matrix_dev->mdevs_lock)
+
+	This lock must be held to prevent access to the matrix_mdev state
+	while writing/reading state values during the operation.
+
+* Operations: Assign or unassign an adapter, domain or control domain of a
+	      mediated device under the control of the vfio_ap device driver:
+
+  1. Guests Lock (matrix_dev->guests_lock):
+
+	This mutex must be held while accessing the KVM pointer
+	(matrix_dev->kvm) to plug/unplug AP devices passed through to the
+	KVM guest as a result of the assignment/unassignment operation.
+	Assignment of an AP device may result in additional queue devices
+	or control domains being plugged into the guest. Similarly, unassignment
+	may result in unplugging queue devices or control domains from the
+	guest
+
+  3. KVM Lock (matrix_mdev->kvm->lock):
+
+	This mutex must be held while the AP devices passed through to the KVM
+	guest are plugged in or unplugged.
+
+  4. Matrix Devices Lock (matrix_dev->mdevs_lock)
+
+	This lock must be held to prevent access to the matrix_mdev state
+	while writing/reading state values during the operation. For example, to
+	determine which AP devices need to be plugged/unplugged, the lock
+	must be held to prevent other operations from changing the data used
+	to construct the guest's AP configuration.
+
+* Operations: Probe or remove an AP queue device:
+
+  When a queue device is bound to the vfio_ap device driver, the driver's probe
+  callback is invoked. Similarly, when a queue device is unbound from the
+  driver it's remove callback is invoked. The probe and remove functions will
+  take locks in the following order:
+
+  1. Guests Lock (matrix_dev->guests_lock):
+
+	This mutex must be held for the duration of this operation.
+
+	At the time of the operation, the vfio_ap device driver will only have
+	the APQN of the queue being probed or removed, so the
+	matrix_dev->mdevs_list must be perused to locate the matrix_mdev
+	instance to which the queue is assigned. The Guests Lock must be held
+	during this time to prevent the list from being changed while processing
+	the probe/remove.
+
+	Once the matrix_mdev is found, the operation must determine whether the
+	mediated device is passed through to a guest (matrix_mdev->kvm != NULL),
+	then use the KVM pointer to perform the plug/unplug operation. Here
+	again, the lock must be held to prevent other operations from accessing
+	the KVM pointer for the same purpose.
+
+  2. KVM Lock (kvm->lock):
+
+	This mutex must be held while the AP devices passed through to the KVM
+	guest are plugged in or unplugged to prevent other operations from
+	accessing the guest's state while it is in flux.
+
+  3. Matrix Devices Lock (matrix_dev->mdevs_lock)
+
+	This lock must be held to prevent access to the matrix_mdev state
+	while writing/reading state values during the operation, such as the
+	masks used to construct the KVM guest's AP configuration.
+
+* Operations: Probe or remove a mediated device:
+
+  1. Guests Lock (matrix_dev->guests_lock):
+
+	This mutex must be held while adding the matrix_mdev to the
+	matrix_dev->mdev_list during the probe operation or when removing it
+	from the list during the remove operation. This is to prevent access by
+	other functions that must traverse the list to find a matrix_mdev for
+	the purpose of plugging/unplugging AP devices passed through to a KVM
+	guest (i.e., probe/remove queue callbacks), while the list is being
+	modified.
+
+  2. Matrix Devices Lock (matrix_dev->mdevs_lock)
+
+	This lock must be held to prevent access to the matrix_mdev state
+	while writing/reading state values during the probe or remove operations
+	such as initializing the hashtable of queue devices
+	(matrix_mdev->qtable.queues) assigned to the matrix_mdev.
+
+* Operation: Handle interception of the PQAP(AQIC) instruction sub-function:
+
+  1. PQAP Hook Lock (kvm->arch.crypto.pqap_hook_rwsem)
+
+	This semaphore must be held in read mode while retrieving the function
+	pointer (kvm->arch.crypto.pqap_hook) and executing the function that
+	handles the interception of the PQAP(AQIC) instruction sub-function by
+	the host.
+
+  2. Matrix Devices Lock (matrix_dev->mdevs_lock)
+
+	This lock must be held to prevent access to the matrix_mdev state
+	while writing/reading state values during the execution of the
+	PQAP(AQIC) instruction sub-function interception handler. For example,
+	the handler must iterate over the matrix_mdev->qtable.queues hashtable
+	to find the vfio_ap_queue object representing the queue for which
+	interrupts are being enabled or disabled.
+
+  Note: It is not necessary to take the Guests Lock (matrix_dev->guests_lock)
+	or the KVM Lock (matrix_mdev->kvm->lock) because the KVM pointer
+	will not be accessed to plug/unplug AP devices passed through to the
+	guest; it will only be used to allocate or free resources for processing
+	interrupts.
+
+* Operation: Handle AP configuration changed notification:
+
+  The vfio_ap device driver registers a callback function to be notified when
+  the AP bus detects that the host's AP configuration has changed. This can
+  occur due to the addition or removal of AP adapters, domains or control
+  domains via an SE or HMC connected to a DPM enabled LPAR. The objective of the
+  handler is to remove the queues no longer accessible via the host in bulk
+  rather than one queue at a time via the driver's queue device remove callback.
+  The locks and the order in which they must be taken by this operation are:
+
+ 1. Guests Lock (matrix_dev->guests_lock):
+
+	This mutex must be held for the duration of the operation to:
+
+	* Iterate over the matrix_dev->mdev_list to find each matrix_mdev from
+	  which a queue device to be removed is assigned and prevent other
+	  operations from modifying the list while processing the affected
+	  matrix_mdev instances.
+
+	* Prevent other operations from acquiring access to the KVM pointer in
+	  each affected matrix_mdev instance (matrix_mdev->kvm) for the purpose
+	  of plugging/unplugging AP devices passed through to the KVM guest via
+	  that instance.
+
+2. KVM Lock (kvm->lock):
+
+	This mutex must be held for each affected matrix_mdev instance while
+	the AP devices passed through to the KVM guest are unplugged to prevent
+	other operations from accessing the guest's state while it is in flux.
+
+	Note: This lock must be re-acquired for each matrix_mdev instance.
+
+  3. Matrix Devices Lock (matrix_dev->mdevs_lock)
+
+	This lock must be held for each affected matrix_mdev to prevent access
+	to the matrix_mdev state while writing/reading state values during the
+	operation, such as the masks used to construct the KVM guest's AP
+	configuration.
+
+	Note: This lock must be re-acquired for each matrix_mdev instance.
+
+Operation: Handle AP bus scan complete notification:
+
+  The vfio_ap device driver registers a callback function to be notified when
+  the AP bus scan completes after detecting the addition or removal of AP
+  adapters, domains or control domains. The objective of the handler is t
+  add the new queues accessible via the host in bulk rather than one queue
+  at a time via the driver's queue device probe callback. The locks and the
+  order in which they must be taken by this operation are:
+
+  1. Guests Lock (matrix_dev->guests_lock):
+
+	This mutex must be held for the duration of the operation to:
+
+	* Iterate over the matrix_dev->mdev_list to find each matrix_mdev to
+	  which a queue device added is assigned and prevent other operations
+	  from modifying the list while processing each affected matrix_mdev
+	  object.
+
+	* Prevent other operations from acquiring access to the KVM pointer in
+	  each affected matrix_mdev instance (matrix_mdev->kvm) for the purpose
+	  of plugging/unplugging AP devices passed through to the KVM guest via
+	  that instance.
+
+  2. KVM Lock (kvm->lock):
+
+	This mutex must be held for each affected matrix_mdev instance while
+	the AP devices passed through to the KVM guest are plugged in to prevent
+	other operations from accessing the guest's state while it is in flux.
+
+	Note: This lock must be re-acquired for each matrix_mdev instance.
+
+  3. Matrix Devices Lock (matrix_dev->mdevs_lock):
+
+	This lock must be held for each affected matrix_mdev to prevent access
+	to the matrix_mdev state while writing/reading state values during the
+	operation, such as the masks used to construct the KVM guest's AP
+	configuration.
+
+	Note: This lock must be re-acquired for each matrix_mdev instance.
+
+Operation: Handle resource in use query:
+
+  The vfio_ap device driver registers a callback function with the AP bus to be
+  called when changes to the bus's sysfs /sys/bus/ap/apmask or
+  /sys/bus/ap/aqmask attributes would result in one or more AP queue devices
+  getting unbound from the vfio_ap device driver to verify none of them are in
+  use by the driver (i.e., assigned to a matrix_mdev instance). If this function
+  is called while an adapter or domain is also being assigned to a mediated
+  device, this could result in a deadlock; for example:
+
+  1. A system administrator assigns an adapter to a mediated device under the
+     control of the vfio_ap device driver. The driver will need to first take
+     the matrix_dev->guests_lock to potentially hot plug the adapter into
+     the KVM guest.
+  2. At the same time, a system administrator sets a bit in the sysfs
+     /sys/bus/ap/ap_mask attribute. To complete the operation, the AP bus
+     must:
+     a. Take the ap_perms_mutex lock to update the object storing the values
+        for the /sys/bus/ap/ap_mask attribute.
+     b. Call the vfio_ap device driver's in-use callback to verify that no
+        queues now being reserved for the default zcrypt drivers are
+        in use by the vfio_ap device driver. To do the verification, the in-use
+        callback function takes the matrix_dev->guests_lock, but has to wait
+        because it is already held by the operation in 1 above.
+  3. The vfio_ap device driver calls an AP bus function to verify that the
+     new queues resulting from the assignment of the adapter in step 1 are
+     not reserved for the default zcrypt device driver. This AP bus function
+     tries to take the ap_perms_mutex lock but gets stuck waiting for the
+     it due to step 2a above.
+
+    Consequently, we have the following deadlock situation:
+
+    matrix_dev->guests_lock locked (1)
+    ap_perms_mutex lock locked (2a)
+    Waiting for matrix_dev->lock (2b) which is currently held (1)
+    Waiting for ap_perms_mutex lock (3) which is currently held (2a)
+
+  To prevent the deadlock scenario, the in_use operation will take the
+  required locks using the mutex_trylock() function and if the lock can not be
+  acquired will terminate and return -EBUSY to indicate the driver is busy
+  processing another request.
+
+  The locks required to respond to the handle resource in use query and the
+  order in which they must be taken are:
+
+  1. Guests Lock (matrix_dev->guests_lock):
+
+  This mutex must be held for the duration of the operation to iterate over the
+  matrix_dev->mdev_list to determine whether any of the queues to be unbound
+  are assigned to a matrix_mdev instance.
+
+  2. Matrix Devices Lock (matrix_dev->mdevs_lock):
+
+  This mutex must be held for the duration of the operation to ensure that the
+  AP configuration of each matrix_mdev instance does not change while verifying
+  that none of the queue devices to be removed from the vfio_ap driver are
+  assigned to it.
-- 
2.31.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ