[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6a574d7e-c390-3764-a57c-23c7653f2f69@linux.ibm.com>
Date: Thu, 2 Jun 2022 14:16:25 -0400
From: "Jason J. Herne" <jjherne@...ux.ibm.com>
To: Tony Krowiak <akrowiak@...ux.ibm.com>, linux-s390@...r.kernel.org,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Cc: freude@...ux.ibm.com, borntraeger@...ibm.com, cohuck@...hat.com,
mjrosato@...ux.ibm.com, pasic@...ux.ibm.com,
alex.williamson@...hat.com, kwankhede@...dia.com,
fiuczy@...ux.ibm.com
Subject: Re: [PATCH v19 15/20] s390/vfio-ap: implement in-use callback for
vfio_ap driver
On 4/4/22 18:10, Tony Krowiak wrote:
> Let's implement the callback to indicate when an APQN
> is in use by the vfio_ap device driver. The callback is
> invoked whenever a change to the apmask or aqmask would
> result in one or more queue devices being removed from the driver. The
> vfio_ap device driver will indicate a resource is in use
> if the APQN of any of the queue devices to be removed are assigned to
> any of the matrix mdevs under the driver's control.
>
> There is potential for a deadlock condition between the
> matrix_dev->guests_lock used to lock the guest during assignment of
> adapters and domains and the ap_perms_mutex locked by the AP bus when
> changes are made to the sysfs apmask/aqmask attributes.
>
> The AP Perms lock controls access to the objects that store the adapter
> numbers (ap_perms) and domain numbers (aq_perms) for the sysfs
> /sys/bus/ap/apmask and /sys/bus/ap/aqmask attributes. These attributes
> identify which queues are reserved for the zcrypt default device drivers.
> Before allowing a bit to be removed from either mask, the AP bus must check
> with the vfio_ap device driver to verify that none of the queues are
> assigned to any of its mediated devices.
>
> The apmask/aqmask attributes can be written or read at any time from
> userspace, so care must be taken to prevent a deadlock with asynchronous
> operations that might be taking place in the vfio_ap device driver. For
> example, consider the following:
>
> 1. A system administrator assigns an adapter to a mediated device under the
> control of the vfio_ap device driver. The driver will need to first take
> the matrix_dev->guests_lock to potentially hot plug the adapter into
> the KVM guest.
> 2. At the same time, a system administrator sets a bit in the sysfs
> /sys/bus/ap/ap_mask attribute. To complete the operation, the AP bus
> must:
> a. Take the ap_perms_mutex lock to update the object storing the values
> for the /sys/bus/ap/ap_mask attribute.
> b. Call the vfio_ap device driver's in-use callback to verify that the
> queues now being reserved for the default zcrypt drivers are not
> assigned to a mediated device owned by the vfio_ap device driver. To
> do the verification, the in-use callback function takes the
> matrix_dev->guests_lock, but has to wait because it is already held
> by the operation in 1 above.
> 3. The vfio_ap device driver calls an AP bus function to verify that the
> new queues resulting from the assignment of the adapter in step 1 are
> not reserved for the default zcrypt device driver. This AP bus function
> tries to take the ap_perms_mutex lock but gets stuck waiting for the
> waiting for the lock due to step 2a above.
>
> Consequently, we have the following deadlock situation:
>
> matrix_dev->guests_lock locked (1)
> ap_perms_mutex lock locked (2a)
> Waiting for matrix_dev->gusts_lock (2b) which is currently held (1)
> Waiting for ap_perms_mutex lock (3) which is currently held (2a)
>
> To prevent this deadlock scenario, the function called in step 3 will no
> longer take the ap_perms_mutex lock and require the caller to take the
> lock. The lock will be the first taken by the adapter/domain assignment
> functions in the vfio_ap device driver to maintain the proper locking
> order.
>
> Signed-off-by: Tony Krowiak <akrowiak@...ux.ibm.com>
> ---
> drivers/s390/crypto/ap_bus.c | 31 ++++++++----
> drivers/s390/crypto/vfio_ap_drv.c | 1 +
> drivers/s390/crypto/vfio_ap_ops.c | 68 +++++++++++++++++++++++++++
> drivers/s390/crypto/vfio_ap_private.h | 2 +
> 4 files changed, 94 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
> index fdf16cb70881..f48552e900a2 100644
> --- a/drivers/s390/crypto/ap_bus.c
> +++ b/drivers/s390/crypto/ap_bus.c
> @@ -817,6 +817,17 @@ static void ap_bus_revise_bindings(void)
> bus_for_each_dev(&ap_bus_type, NULL, NULL, __ap_revise_reserved);
> }
>
> +/**
> + * ap_apqn_in_matrix_owned_by_def_drv: indicates whether an APQN c is reserved
> + * for the host drivers or not.
> + * @card: the APID of the adapter card to check
> + * @queue: the APQI of the queue to check
> + *
> + * Note: the ap_perms_mutex must be locked by the caller of this function.
> + *
> + * Return: an int specifying whether the APQN is reserved for the host (1) or
> + * not (0)
> + */
Comment's function name does not match real function name. Also APQN "c", from
description does not appear to exist.
> int ap_owned_by_def_drv(int card, int queue)
> {
> int rc = 0;
> @@ -824,25 +835,31 @@ int ap_owned_by_def_drv(int card, int queue)
> if (card < 0 || card >= AP_DEVICES || queue < 0 || queue >= AP_DOMAINS)
> return -EINVAL;
>
> - mutex_lock(&ap_perms_mutex);
> -
> if (test_bit_inv(card, ap_perms.apm)
> && test_bit_inv(queue, ap_perms.aqm))
> rc = 1;
>
> - mutex_unlock(&ap_perms_mutex);
> -
> return rc;
> }
> EXPORT_SYMBOL(ap_owned_by_def_drv);
>
> +/**
> + * ap_apqn_in_matrix_owned_by_def_drv: indicates whether every APQN contained in
> + * a set is reserved for the host drivers
> + * or not.
> + * @apm: a bitmap specifying a set of APIDs comprising the APQNs to check
> + * @aqm: a bitmap specifying a set of APQIs comprising the APQNs to check
> + *
> + * Note: the ap_perms_mutex must be locked by the caller of this function.
> + *
> + * Return: an int specifying whether each APQN is reserved for the host (1) or
> + * not (0)
> + */
> int ap_apqn_in_matrix_owned_by_def_drv(unsigned long *apm,
> unsigned long *aqm)
> {
> int card, queue, rc = 0;
>
> - mutex_lock(&ap_perms_mutex);
> -
> for (card = 0; !rc && card < AP_DEVICES; card++)
> if (test_bit_inv(card, apm) &&
> test_bit_inv(card, ap_perms.apm))
> @@ -851,8 +868,6 @@ int ap_apqn_in_matrix_owned_by_def_drv(unsigned long *apm,
> test_bit_inv(queue, ap_perms.aqm))
> rc = 1;
>
> - mutex_unlock(&ap_perms_mutex);
> -
> return rc;
> }
> EXPORT_SYMBOL(ap_apqn_in_matrix_owned_by_def_drv);
> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
> index c258e5f7fdfc..2c3084589347 100644
> --- a/drivers/s390/crypto/vfio_ap_drv.c
> +++ b/drivers/s390/crypto/vfio_ap_drv.c
> @@ -107,6 +107,7 @@ static const struct attribute_group vfio_queue_attr_group = {
> static struct ap_driver vfio_ap_drv = {
> .probe = vfio_ap_mdev_probe_queue,
> .remove = vfio_ap_mdev_remove_queue,
> + .in_use = vfio_ap_mdev_resource_in_use,
> .ids = ap_queue_ids,
> };
>
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 49ed54dc9e05..3ece2cd9f1e7 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -902,6 +902,21 @@ static int vfio_ap_mdev_verify_no_sharing(unsigned long *mdev_apm,
> return 0;
> }
>
> +/**
> + * vfio_ap_mdev_validate_masks - verify that the APQNs assigned to the mdev are
> + * not reserved for the default zcrypt driver and
> + * are not assigned to another mdev.
> + *
> + * @matrix_mdev: the mdev to which the APQNs being validated are assigned.
> + *
> + * Return: One of the following values:
> + * o the error returned from the ap_apqn_in_matrix_owned_by_def_drv() function,
> + * most likely -EBUSY indicating the ap_perms_mutex lock is already held.
> + * o EADDRNOTAVAIL if an APQN assigned to @matrix_mdev is reserved for the
> + * zcrypt default driver.
> + * o EADDRINUSE if an APQN assigned to @matrix_mdev is assigned to another mdev
> + * o A zero indicating validation succeeded.
> + */
> static int vfio_ap_mdev_validate_masks(struct ap_matrix_mdev *matrix_mdev)
> {
> if (ap_apqn_in_matrix_owned_by_def_drv(matrix_mdev->matrix.apm,
> @@ -951,6 +966,10 @@ static void vfio_ap_mdev_link_adapter(struct ap_matrix_mdev *matrix_mdev,
> * An APQN derived from the cross product of the APID being assigned
> * and the APQIs previously assigned is being used by another mediated
> * matrix device
> + *
> + * 5. -EAGAIN
> + * A lock required to validate the mdev's AP configuration could not
> + * be obtained.
> */
> static ssize_t assign_adapter_store(struct device *dev,
> struct device_attribute *attr,
> @@ -961,6 +980,7 @@ static ssize_t assign_adapter_store(struct device *dev,
> DECLARE_BITMAP(apm_delta, AP_DEVICES);
> struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);
>
> + mutex_lock(&ap_perms_mutex);
> get_update_locks_for_mdev(matrix_mdev);
>
> ret = kstrtoul(buf, 0, &apid);
> @@ -991,6 +1011,7 @@ static ssize_t assign_adapter_store(struct device *dev,
> ret = count;
> done:
> release_update_locks_for_mdev(matrix_mdev);
> + mutex_unlock(&ap_perms_mutex);
>
> return ret;
> }
> @@ -1144,6 +1165,10 @@ static void vfio_ap_mdev_link_domain(struct ap_matrix_mdev *matrix_mdev,
> * An APQN derived from the cross product of the APQI being assigned
> * and the APIDs previously assigned is being used by another mediated
> * matrix device
> + *
> + * 5. -EAGAIN
> + * The lock required to validate the mdev's AP configuration could not
> + * be obtained.
> */
> static ssize_t assign_domain_store(struct device *dev,
> struct device_attribute *attr,
> @@ -1154,6 +1179,7 @@ static ssize_t assign_domain_store(struct device *dev,
> DECLARE_BITMAP(aqm_delta, AP_DOMAINS);
> struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);
>
> + mutex_lock(&ap_perms_mutex);
> get_update_locks_for_mdev(matrix_mdev);
>
> ret = kstrtoul(buf, 0, &apqi);
> @@ -1184,6 +1210,7 @@ static ssize_t assign_domain_store(struct device *dev,
> ret = count;
> done:
> release_update_locks_for_mdev(matrix_mdev);
> + mutex_unlock(&ap_perms_mutex);
>
> return ret;
> }
> @@ -1868,3 +1895,44 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
> kfree(q);
> release_update_locks_for_mdev(matrix_mdev);
> }
> +
> +/**
> + * vfio_ap_mdev_resource_in_use: check whether any of a set of APQNs is
> + * assigned to a mediated device under the control
> + * of the vfio_ap device driver.
> + *
> + * @apm: a bitmap specifying a set of APIDs comprising the APQNs to check.
> + * @aqm: a bitmap specifying a set of APQIs comprising the APQNs to check.
> + *
> + * This function is invoked by the AP bus when changes to the apmask/aqmask
> + * attributes will result in giving control of the queue devices specified via
> + * @apm and @aqm to the default zcrypt device driver. Prior to calling this
> + * function, the AP bus locks the ap_perms_mutex. If this function is called
> + * while an adapter or domain is being assigned to a mediated device, the
> + * assignment operations will take the matrix_dev->guests_lock and
> + * matrix_dev->mdevs_lock then call the ap_apqn_in_matrix_owned_by_def_drv
> + * function, which also locks the ap_perms_mutex. This could result in a
> + * deadlock.
> + *
> + * To avoid a deadlock, this function will verify that the
> + * matrix_dev->guests_lock and matrix_dev->mdevs_lock are not currently held and
> + * will return -EBUSY if the locks can not be obtained.
This part of the comment does not seem to match reality. Unless I'm missing
something? Did you mean to call mutex_trylock? Or is the comment stale?
> + * Return:
> + * * -EBUSY if the locks required by this function are already locked.
> + * * -EADDRINUSE if one or more of the APQNs specified via @apm/@aqm are
> + * assigned to a mediated device under the control of the vfio_ap
> + * device driver.
> + */
> +int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm)
> +{
> + int ret;
> +
> + mutex_lock(&matrix_dev->guests_lock);
> + mutex_lock(&matrix_dev->mdevs_lock);
> + ret = vfio_ap_mdev_verify_no_sharing(apm, aqm);
> + mutex_unlock(&matrix_dev->mdevs_lock);
> + mutex_unlock(&matrix_dev->guests_lock);
> +
> + return ret;
> +}
> diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
> index 6d4ca39f783b..cbffa0bd01da 100644
> --- a/drivers/s390/crypto/vfio_ap_private.h
> +++ b/drivers/s390/crypto/vfio_ap_private.h
> @@ -147,4 +147,6 @@ void vfio_ap_mdev_unregister(void);
> int vfio_ap_mdev_probe_queue(struct ap_device *queue);
> void vfio_ap_mdev_remove_queue(struct ap_device *queue);
>
> +int vfio_ap_mdev_resource_in_use(unsigned long *apm, unsigned long *aqm);
> +
> #endif /* _VFIO_AP_PRIVATE_H_ */
--
-- Jason J. Herne (jjherne@...ux.ibm.com)
Powered by blists - more mailing lists