linux-kernel - Re: [PATCH v18 10/18] s390/vfio-ap: allow hot plug/unplug of AP devices when assigned/unassigned

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <77da51e0-e978-c049-7cc0-1574d98df9aa@linux.ibm.com>
Date:   Fri, 18 Mar 2022 13:30:44 -0400
From:   Tony Krowiak <akrowiak@...ux.ibm.com>
To:     jjherne@...ux.ibm.com, linux-s390@...r.kernel.org,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Cc:     freude@...ux.ibm.com, borntraeger@...ibm.com, cohuck@...hat.com,
        mjrosato@...ux.ibm.com, pasic@...ux.ibm.com,
        alex.williamson@...hat.com, kwankhede@...dia.com,
        fiuczy@...ux.ibm.com
Subject: Re: [PATCH v18 10/18] s390/vfio-ap: allow hot plug/unplug of AP
 devices when assigned/unassigned



On 3/14/22 09:17, Jason J. Herne wrote:
> On 3/11/22 11:07, Tony Krowiak wrote:
>>
>>
>> On 3/11/22 09:26, Jason J. Herne wrote:
>>> On 2/14/22 19:50, Tony Krowiak wrote:
>>>> Let's allow adapters, domains and control domains to be hot plugged
>>>> into and hot unplugged from a KVM guest using a matrix mdev when an
>>>> adapter, domain or control domain is assigned to or unassigned from
>>>> the matrix mdev.
>>>>
>>>> Whenever an assignment or unassignment of an adapter, domain or 
>>>> control
>>>> domain is performed, the AP configuration assigned to the matrix
>>>> mediated device will be filtered and assigned to the AP control block
>>>> (APCB) that supplies the AP configuration to the guest so that no
>>>> adapter, domain or control domain that is not in the host's AP
>>>> configuration nor any APQN that does not reference a queue device 
>>>> bound
>>>> to the vfio_ap device driver is assigned.
>>>>
>>>> After updating the APCB, if the mdev is in use by a KVM guest, it is
>>>> hot plugged into the guest to dynamically provide access to the 
>>>> adapters,
>>>> domains and control domains provided via the newly refreshed APCB.
>>>>
>>>> Keep in mind that the matrix_dev->guests_lock must be taken outside 
>>>> of the
>>>> matrix_mdev->kvm->lock which in turn must be taken outside of the
>>>> matrix_dev->mdevs_lock in order to avoid circular lock dependencies 
>>>> (i.e.,
>>>> a lockdep splat).Consequently, the locking order for hot plugging the
>>>> guest's APCB must be:
>>>>
>>>> matrix_dev->guests_lock => matrix_mdev->kvm->lock => 
>>>> matrix_dev->mdevs_lock
>>>>
>>>> Signed-off-by: Tony Krowiak <akrowiak@...ux.ibm.com>
>>>> ---
>>>>   drivers/s390/crypto/vfio_ap_ops.c | 198 
>>>> +++++++++++++++++++-----------
>>>>   1 file changed, 125 insertions(+), 73 deletions(-)
>>>>
>>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
>>>> b/drivers/s390/crypto/vfio_ap_ops.c
>>>> index 623a4b38676d..4c382cd3afc7 100644
>>>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>>> @@ -317,10 +317,25 @@ static void vfio_ap_matrix_init(struct 
>>>> ap_config_info *info,
>>>>       matrix->adm_max = info->apxa ? info->Nd : 15;
>>>>   }
>>>>   -static void vfio_ap_mdev_filter_cdoms(struct ap_matrix_mdev 
>>>> *matrix_mdev)
>>>> +static void vfio_ap_mdev_hotplug_apcb(struct ap_matrix_mdev 
>>>> *matrix_mdev)
>>>>   {
>>>> +    if (matrix_mdev->kvm)
>>>> +        kvm_arch_crypto_set_masks(matrix_mdev->kvm,
>>>> +                      matrix_mdev->shadow_apcb.apm,
>>>> +                      matrix_mdev->shadow_apcb.aqm,
>>>> +                      matrix_mdev->shadow_apcb.adm);
>>>> +}
>>>
>>> This function updates a kvm guest's apcb. So let's rename it to
>>> vfio_ap_update_apcb(). 
>>
>> The idea was to indicate that the AP adapters, domains and control
>> domains configured in the shadow APCB are being hot plugged into
>> a running guest. Having said that, I can see your point. I'm not 
>> married to
>> the function name, but I would prefer to go with
>> 'vfio_ap_update_guest_apcb()' to distinguish between the shadow and
>> the real apcb.
>>
>>> You can also call this function in vfio_ap_mdev_set_kvm,
>>> instead of duplicating the code to call kvm_arch_crypto_set_masks().
>>
>> The reason I didn't do that is because we've already verified the
>> matrix_mdev->kvm in kvm_arch_crypto_set_masks(). I'm not sure what
>> it buys us, but I'm not adverse to making the change.
>>
>
> It avoids code duplication which makes the driver smaller, and slightly
> easier to read. It also reduces rework effort if/when mask handling ever
> changes.

Good points.

>
>>>
>>>
>>>
>>>
>>>> +static bool vfio_ap_mdev_filter_cdoms(struct ap_matrix_mdev 
>>>> *matrix_mdev)
>>>> +{
>>>> +    DECLARE_BITMAP(shadow_adm, AP_DOMAINS);
>>>> +
>>>> +    bitmap_copy(shadow_adm, matrix_mdev->shadow_apcb.adm, 
>>>> AP_DOMAINS);
>>>>       bitmap_and(matrix_mdev->shadow_apcb.adm, 
>>>> matrix_mdev->matrix.adm,
>>>>              (unsigned long *)matrix_dev->info.adm, AP_DOMAINS);
>>>> +
>>>> +    return !bitmap_equal(shadow_adm, matrix_mdev->shadow_apcb.adm,
>>>> +                 AP_DOMAINS);
>>>>   }
>>>
>>> your variable, shadow_adm, should be named original_adm. Since it 
>>> represents
>>> the original value before filtering. This makes the intent much more 
>>> clear.
>>> Same goes for the vars in vfio_ap_mdev_filter_matrix().
>>
>> Makes sense, but I think I'll go with prev_shadow_apm, 
>> prev_shadow_aqm and
>> prev_shadow_adm. That seems more accurate since these are not the 
>> original
>> copies of the bitmaps, but copies of the previous versions prior to 
>> filtering.
>
> That works for me :) Thanks! In general, I like to avoid generic 
> variable names
> like "mask" or "thing" whenever possible. Especially if I'm dealing 
> with multiple
> instances of the same type of data within the same scope. Giving each 
> variable a
> specific name can really help de-obfuscate the code.

Agreed. It has been done.

>
>>>
>>> ...
>>>> +/**
>>>> + * vfio_ap_mdev_get_locks - acquire the locks required to 
>>>> assign/unassign AP
>>>> + *                adapters, domains and control domains for an 
>>>> mdev in
>>>> + *                the proper locking order.
>>>> + *
>>>> + * @matrix_mdev: the matrix mediated device object
>>>> + */
>>>> +static void vfio_ap_mdev_get_locks(struct ap_matrix_mdev 
>>>> *matrix_mdev)
>>>> +{
>>>> +    /* Lock the mutex required to access the KVM guest's state */
>>>> +    mutex_lock(&matrix_dev->guests_lock);
>>>> +
>>>> +    /* If a KVM guest is running, lock the mutex required to 
>>>> plug/unplug the
>>>> +     * AP devices passed through to the guest
>>>> +     */
>>>> +    if (matrix_mdev->kvm)
>>>> +        mutex_lock(&matrix_mdev->kvm->lock);
>>>> +
>>>> +    /* The lock required to access the mdev's state */
>>>> +    mutex_lock(&matrix_dev->mdevs_lock);
>>>> +}
>>>
>>> Simplifying the cdoe, and removing duplication by moving the locking 
>>> code to a
>>> function is probably a good thing. But I don't feel like this 
>>> belongs to this
>>> particular patch. In general, a patch should only do one thing, and 
>>> ideally that
>>> one thing should be as small as reasonably possible. This makes the 
>>> patch easier
>>> to read and to review.
>>>
>>> I feel like, as much as possible, you should refactor the locking in 
>>> a series
>>> of patches that are all kept together. Ideally, they would be a 
>>> patch series
>>> completely separate from dynamic ap. After all, this series is 
>>> already at 18
>>> patches. :)
>>
>> I'm going to have to disagree, this locking scheme makes no sense 
>> outside of
>> this series. It is only necessary because we now update a guest's APCB
>> whenever an adapter, domain or control domain is assigned or unassigned,
>> when a queue device is probed or removed and when the vfio_ap driver is
>> notified that the host's AP configuration has changed.
>>
>> Prior to this series, a guest's APCB was updated only when the vfio_ap
>> driver was notified that the KVM pointer was set or cleared, so it was
>> only necessary to ensure the kvm->lock is taken before the 
>> matrix_dev->lock
>> in the functions that handle the VFIO_GROUP_NOTIFY_SET_KVM group
>> notification event. Prior to this, a patch series to introduce the 
>> matrix_dev->guests lock
>> would make no sense because it is not needed to enforce the locking 
>> order in those
>> functions listed in the previous paragraph because we didn't update 
>> the guest's
>> APCB in those functions.
>>
>
> I don't understand the lock code enough to argue a whole lot here :) 
> But I do still
> think, at the very least, that your refactoring of the locking into 
> get_locks/put_locks
> functions really does belong in a separate patch. Refactoring is not 
> directly related to
> the hotplug/unplug. Also, this is not a minor refactor. This 
> refactoring touches the code
> all over the place and really just adds noise to this patch. That 
> noise makes it harder
> to review.

Okay, this is a little different than what you were asking for in your 
previous review
comment in which you suggested the locking patches should be in a series 
separate
from dynamic ap. This I can get on board with; however, we are now 
talking about
increasing the number of patches in this series.

>
>
>>> ...
>>>>   /**
>>>>    * assign_adapter_store - parses the APID from @buf and sets the
>>>>    * corresponding bit in the mediated matrix device's APM
>>>> @@ -649,17 +723,9 @@ static ssize_t assign_adapter_store(struct 
>>>> device *dev,
>>>>       int ret;
>>>>       unsigned long apid;
>>>>       DECLARE_BITMAP(apm, AP_DEVICES);
>>>> -
>>>>       struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);
>>>>   -    mutex_lock(&matrix_dev->guests_lock);
>>>> -    mutex_lock(&matrix_dev->mdevs_lock);
>>>> -
>>>> -    /* If the KVM guest is running, disallow assignment of adapter */
>>>> -    if (matrix_mdev->kvm) {
>>>> -        ret = -EBUSY;
>>>> -        goto done;
>>>> -    }
>>>> +    vfio_ap_mdev_get_locks(matrix_mdev);
>>>>         ret = kstrtoul(buf, 0, &apid);
>>>>       if (ret)
>>>> @@ -671,8 +737,6 @@ static ssize_t assign_adapter_store(struct 
>>>> device *dev,
>>>>       }
>>>>         set_bit_inv(apid, matrix_mdev->matrix.apm);
>>>> -    memset(apm, 0, sizeof(apm));
>>>> -    set_bit_inv(apid, apm);
>>>>         ret = vfio_ap_mdev_validate_masks(matrix_mdev);
>>>
>>> It looks like you moved the memset() and set_bit_inv() to be closer 
>>> to where
>>> "apm" is used, namely, the call to vfio_ap_mdev_filter_matrix(). Any 
>>> reason you
>>> cannot move it down under the call to vfio_ap_mdev_link_adapter()? 
>>> That would
>>> get it even closer to where it is used.
>>
>> I didn't move it to be closer to where it is used, I moved it because 
>> it was not
>> necessary to do the memset/set_bit_inv when not necessary to do so. 
>> Having
>> said that, it can definitely be moved after the 
>> vfio_ap_mdev_link_adapter().
>>
>>>
>>> Also, I think renaming apm to apm_delta or apm_diff makes sense 
>>> here. After all,
>>> it is the difference between the original apm, and the new apm. The 
>>> new apm
>>> has an extra bit for the newly added adapter. Do I have that right? 
>>> If so, I
>>> think renaming the variable will make the code clearer.
>>
>> The purpose of this bitmap is to limit the filtering to the new APID 
>> being assigned
>> because there is no need to do filtering of adapters already 
>> assigned; so, it is not
>> really a new apm per se. It might be more accurate to call it 
>> new_apid or new_apids,
>> although there will only be one bit set in the bitmap.
>
> My main concern here was generic variables names. The "new_apm" will 
> have exactly one
> new bit set. That bit is the delta (or the difference) between the 
> previously existing apm, and the new apm, which will be the result of 
> adding in whatever the "apid" bit is. Therefore, it really is a delta, 
> right? This was the basis for my suggestion of the
> name. Its really not the "new" apm... its the difference between the 
> old and new.

I overlooked the names you suggested - i.e., apm_delta/apm_diff - and 
for some reason
beamed in on "new apm" as if you were suggesting that as the name of the 
variable.
Of the two names, I prefer apm_delta, so I will go with that.

>
>
>