[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <043e8669-145c-df44-74c4-a3023cb6b159@linux.ibm.com>
Date: Thu, 2 Jun 2022 16:21:03 -0400
From: "Jason J. Herne" <jjherne@...ux.ibm.com>
To: Tony Krowiak <akrowiak@...ux.ibm.com>, linux-s390@...r.kernel.org,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Cc: freude@...ux.ibm.com, borntraeger@...ibm.com, cohuck@...hat.com,
mjrosato@...ux.ibm.com, pasic@...ux.ibm.com,
alex.williamson@...hat.com, kwankhede@...dia.com,
fiuczy@...ux.ibm.com
Subject: Re: [PATCH v19 15/20] s390/vfio-ap: implement in-use callback for
vfio_ap driver
On 6/2/22 15:19, Tony Krowiak wrote:
>
>
> On 6/2/22 2:16 PM, Jason J. Herne wrote:
>> On 4/4/22 18:10, Tony Krowiak wrote:
>>> Let's implement the callback to indicate when an APQN
>>> is in use by the vfio_ap device driver. The callback is
>>> invoked whenever a change to the apmask or aqmask would
>>> result in one or more queue devices being removed from the driver. The
>>> vfio_ap device driver will indicate a resource is in use
>>> if the APQN of any of the queue devices to be removed are assigned to
>>> any of the matrix mdevs under the driver's control.
>>>
>>> There is potential for a deadlock condition between the
>>> matrix_dev->guests_lock used to lock the guest during assignment of
>>> adapters and domains and the ap_perms_mutex locked by the AP bus when
>>> changes are made to the sysfs apmask/aqmask attributes.
>>>
>>> The AP Perms lock controls access to the objects that store the adapter
>>> numbers (ap_perms) and domain numbers (aq_perms) for the sysfs
>>> /sys/bus/ap/apmask and /sys/bus/ap/aqmask attributes. These attributes
>>> identify which queues are reserved for the zcrypt default device drivers.
>>> Before allowing a bit to be removed from either mask, the AP bus must check
>>> with the vfio_ap device driver to verify that none of the queues are
>>> assigned to any of its mediated devices.
>>>
>>> The apmask/aqmask attributes can be written or read at any time from
>>> userspace, so care must be taken to prevent a deadlock with asynchronous
>>> operations that might be taking place in the vfio_ap device driver. For
>>> example, consider the following:
>>>
>>> 1. A system administrator assigns an adapter to a mediated device under the
>>> control of the vfio_ap device driver. The driver will need to first take
>>> the matrix_dev->guests_lock to potentially hot plug the adapter into
>>> the KVM guest.
>>> 2. At the same time, a system administrator sets a bit in the sysfs
>>> /sys/bus/ap/ap_mask attribute. To complete the operation, the AP bus
>>> must:
>>> a. Take the ap_perms_mutex lock to update the object storing the values
>>> for the /sys/bus/ap/ap_mask attribute.
>>> b. Call the vfio_ap device driver's in-use callback to verify that the
>>> queues now being reserved for the default zcrypt drivers are not
>>> assigned to a mediated device owned by the vfio_ap device driver. To
>>> do the verification, the in-use callback function takes the
>>> matrix_dev->guests_lock, but has to wait because it is already held
>>> by the operation in 1 above.
>>> 3. The vfio_ap device driver calls an AP bus function to verify that the
>>> new queues resulting from the assignment of the adapter in step 1 are
>>> not reserved for the default zcrypt device driver. This AP bus function
>>> tries to take the ap_perms_mutex lock but gets stuck waiting for the
>>> waiting for the lock due to step 2a above.
>>>
>>> Consequently, we have the following deadlock situation:
>>>
>>> matrix_dev->guests_lock locked (1)
>>> ap_perms_mutex lock locked (2a)
>>> Waiting for matrix_dev->gusts_lock (2b) which is currently held (1)
>>> Waiting for ap_perms_mutex lock (3) which is currently held (2a)
>>>
>>> To prevent this deadlock scenario, the function called in step 3 will no
>>> longer take the ap_perms_mutex lock and require the caller to take the
>>> lock. The lock will be the first taken by the adapter/domain assignment
>>> functions in the vfio_ap device driver to maintain the proper locking
>>> order.
>>>
>>> Signed-off-by: Tony Krowiak <akrowiak@...ux.ibm.com>
>>> ---
>>> drivers/s390/crypto/ap_bus.c | 31 ++++++++----
>>> drivers/s390/crypto/vfio_ap_drv.c | 1 +
>>> drivers/s390/crypto/vfio_ap_ops.c | 68 +++++++++++++++++++++++++++
>>> drivers/s390/crypto/vfio_ap_private.h | 2 +
>>> 4 files changed, 94 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
>>> index fdf16cb70881..f48552e900a2 100644
>>> --- a/drivers/s390/crypto/ap_bus.c
>>> +++ b/drivers/s390/crypto/ap_bus.c
>>> @@ -817,6 +817,17 @@ static void ap_bus_revise_bindings(void)
>>> bus_for_each_dev(&ap_bus_type, NULL, NULL, __ap_revise_reserved);
>>> }
>>> +/**
>>> + * ap_apqn_in_matrix_owned_by_def_drv: indicates whether an APQN c is reserved
>>> + * for the host drivers or not.
>>> + * @card: the APID of the adapter card to check
>>> + * @queue: the APQI of the queue to check
>>> + *
>>> + * Note: the ap_perms_mutex must be locked by the caller of this function.
>>> + *
>>> + * Return: an int specifying whether the APQN is reserved for the host (1) or
>>> + * not (0)
>>> + */
>>
>> Comment's function name does not match real function name. Also APQN "c", from
>> description does not appear to exist.
>
> No, it definitely does not match and the 'c' is an extraneous typo. I'll fix both.
>
>>
>>
>>
>>> int ap_owned_by_def_drv(int card, int queue)
>>> {
>>> int rc = 0;
>>> @@ -824,25 +835,31 @@ int ap_owned_by_def_drv(int card, int queue)
>>> if (card < 0 || card >= AP_DEVICES || queue < 0 || queue >= AP_DOMAINS)
>>> return -EINVAL;
>>> - mutex_lock(&ap_perms_mutex);
>>> -
>>> if (test_bit_inv(card, ap_perms.apm)
>>> && test_bit_inv(queue, ap_perms.aqm))
>>> rc = 1;
>>> - mutex_unlock(&ap_perms_mutex);
>>> -
>>> return rc;
>>> }
>>> EXPORT_SYMBOL(ap_owned_by_def_drv);
>>> +/**
>>> + * ap_apqn_in_matrix_owned_by_def_drv: indicates whether every APQN contained in
>>> + * a set is reserved for the host drivers
>>> + * or not.
>>> + * @apm: a bitmap specifying a set of APIDs comprising the APQNs to check
>>> + * @aqm: a bitmap specifying a set of APQIs comprising the APQNs to check
>>> + *
>>> + * Note: the ap_perms_mutex must be locked by the caller of this function.
>>> + *
>>> + * Return: an int specifying whether each APQN is reserved for the host (1) or
>>> + * not (0)
>>> + */
>>> int ap_apqn_in_matrix_owned_by_def_drv(unsigned long *apm,
>>> unsigned long *aqm)
>>> {
>>> int card, queue, rc = 0;
>>> - mutex_lock(&ap_perms_mutex);
>>> -
>>> for (card = 0; !rc && card < AP_DEVICES; card++)
>>> if (test_bit_inv(card, apm) &&
>>> test_bit_inv(card, ap_perms.apm))
>>> @@ -851,8 +868,6 @@ int ap_apqn_in_matrix_owned_by_def_drv(unsigned long *apm,
>>> test_bit_inv(queue, ap_perms.aqm))
>>> rc = 1;
>>> - mutex_unlock(&ap_perms_mutex);
>>> -
>>> return rc;
>>> }
>>> EXPORT_SYMBOL(ap_apqn_in_matrix_owned_by_def_drv);
>>> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
>>> index c258e5f7fdfc..2c3084589347 100644
>>> --- a/drivers/s390/crypto/vfio_ap_drv.c
>>> +++ b/drivers/s390/crypto/vfio_ap_drv.c
>>> @@ -107,6 +107,7 @@ static const struct attribute_group vfio_queue_attr_group = {
>>> static struct ap_driver vfio_ap_drv = {
>>> .probe = vfio_ap_mdev_probe_queue,
>>> .remove = vfio_ap_mdev_remove_queue,
>>> + .in_use = vfio_ap_mdev_resource_in_use,
>>> .ids = ap_queue_ids,
>>> };
>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>>> index 49ed54dc9e05..3ece2cd9f1e7 100644
>>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>> @@ -902,6 +902,21 @@ static int vfio_ap_mdev_verify_no_sharing(unsigned long *mdev_apm,
>>> return 0;
>>> }
>>> +/**
>>> + * vfio_ap_mdev_validate_masks - verify that the APQNs assigned to the mdev are
>>> + * not reserved for the default zcrypt driver and
>>> + * are not assigned to another mdev.
>>> + *
>>> + * @matrix_mdev: the mdev to which the APQNs being validated are assigned.
>>> + *
>>> + * Return: One of the following values:
>>> + * o the error returned from the ap_apqn_in_matrix_owned_by_def_drv() function,
>>> + * most likely -EBUSY indicating the ap_perms_mutex lock is already held.
>>> + * o EADDRNOTAVAIL if an APQN assigned to @matrix_mdev is reserved for the
>>> + * zcrypt default driver.
>>> + * o EADDRINUSE if an APQN assigned to @matrix_mdev is assigned to another mdev
>>> + * o A zero indicating validation succeeded.
>>> + */
>>> static int vfio_ap_mdev_validate_masks(struct ap_matrix_mdev *matrix_mdev)
>>> {
>>> if (ap_apqn_in_matrix_owned_by_def_drv(matrix_mdev->matrix.apm,
>>> @@ -951,6 +966,10 @@ static void vfio_ap_mdev_link_adapter(struct ap_matrix_mdev
>>> *matrix_mdev,
>>> * An APQN derived from the cross product of the APID being assigned
>>> * and the APQIs previously assigned is being used by another mediated
>>> * matrix device
>>> + *
>>> + * 5. -EAGAIN
>>> + * A lock required to validate the mdev's AP configuration could not
>>> + * be obtained.
>>> */
>>> static ssize_t assign_adapter_store(struct device *dev,
>>> struct device_attribute *attr,
>>> @@ -961,6 +980,7 @@ static ssize_t assign_adapter_store(struct device *dev,
>>> DECLARE_BITMAP(apm_delta, AP_DEVICES);
>>> struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);
>>> + mutex_lock(&ap_perms_mutex);
>>> get_update_locks_for_mdev(matrix_mdev);
>>> ret = kstrtoul(buf, 0, &apid);
>>> @@ -991,6 +1011,7 @@ static ssize_t assign_adapter_store(struct device *dev,
>>> ret = count;
>>> done:
>>> release_update_locks_for_mdev(matrix_mdev);
>>> + mutex_unlock(&ap_perms_mutex);
>>> return ret;
>>> }
>>> @@ -1144,6 +1165,10 @@ static void vfio_ap_mdev_link_domain(struct ap_matrix_mdev
>>> *matrix_mdev,
>>> * An APQN derived from the cross product of the APQI being assigned
>>> * and the APIDs previously assigned is being used by another mediated
>>> * matrix device
>>> + *
>>> + * 5. -EAGAIN
>>> + * The lock required to validate the mdev's AP configuration could not
>>> + * be obtained.
>>> */
>>> static ssize_t assign_domain_store(struct device *dev,
>>> struct device_attribute *attr,
>>> @@ -1154,6 +1179,7 @@ static ssize_t assign_domain_store(struct device *dev,
>>> DECLARE_BITMAP(aqm_delta, AP_DOMAINS);
>>> struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);
>>> + mutex_lock(&ap_perms_mutex);
>>> get_update_locks_for_mdev(matrix_mdev);
>>> ret = kstrtoul(buf, 0, &apqi);
>>> @@ -1184,6 +1210,7 @@ static ssize_t assign_domain_store(struct device *dev,
>>> ret = count;
>>> done:
>>> release_update_locks_for_mdev(matrix_mdev);
>>> + mutex_unlock(&ap_perms_mutex);
>>> return ret;
>>> }
>>> @@ -1868,3 +1895,44 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
>>> kfree(q);
>>> release_update_locks_for_mdev(matrix_mdev);
>>> }
>>> +
>>> +/**
>>> + * vfio_ap_mdev_resource_in_use: check whether any of a set of APQNs is
>>> + * assigned to a mediated device under the control
>>> + * of the vfio_ap device driver.
>>> + *
>>> + * @apm: a bitmap specifying a set of APIDs comprising the APQNs to check.
>>> + * @aqm: a bitmap specifying a set of APQIs comprising the APQNs to check.
>>> + *
>>> + * This function is invoked by the AP bus when changes to the apmask/aqmask
>>> + * attributes will result in giving control of the queue devices specified via
>>> + * @apm and @aqm to the default zcrypt device driver. Prior to calling this
>>> + * function, the AP bus locks the ap_perms_mutex. If this function is called
>>> + * while an adapter or domain is being assigned to a mediated device, the
>>> + * assignment operations will take the matrix_dev->guests_lock and
>>> + * matrix_dev->mdevs_lock then call the ap_apqn_in_matrix_owned_by_def_drv
>>> + * function, which also locks the ap_perms_mutex. This could result in a
>>> + * deadlock.
>>> + *
>>> + * To avoid a deadlock, this function will verify that the
>>> + * matrix_dev->guests_lock and matrix_dev->mdevs_lock are not currently held and
>>> + * will return -EBUSY if the locks can not be obtained.
>>
>> This part of the comment does not seem to match reality. Unless I'm missing
>> something? Did you mean to call mutex_trylock? Or is the comment stale?
>
> The comment was written prior to the changes introduced to avoid the locking
> dependency (i.e., taking the ap_perms_mutex lock by the assignment functions
> which I believe was your suggestion). I shall remove the comment.
With both changes made...
Reviewed-by: Jason J. Herne <jjherne@...ux.ibm.com>
--
-- Jason J. Herne (jjherne@...ux.ibm.com)
Powered by blists - more mailing lists