[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1ff0ff82-f0e6-96d7-7e9a-f46a4957813c@linux.ibm.com>
Date: Thu, 18 Aug 2022 15:02:52 -0400
From: Anthony Krowiak <akrowiak@...ux.ibm.com>
To: Halil Pasic <pasic@...ux.ibm.com>
Cc: linux-s390@...r.kernel.org, linux-kernel@...r.kernel.org,
kvm@...r.kernel.org, jjherne@...ux.ibm.com, borntraeger@...ibm.com,
cohuck@...hat.com, mjrosato@...ux.ibm.com,
alex.williamson@...hat.com, kwankhede@...dia.com,
fiuczy@...ux.ibm.com, stable@...r.kernel.org
Subject: Re: [PATCH v2 1/2] s390/vfio-ap: fix hang during removal of mdev
after duplicate assignment
On 8/18/22 10:12 AM, Halil Pasic wrote:
> On Thu, 18 Aug 2022 09:26:05 -0400
> Tony Krowiak <akrowiak@...ux.ibm.com> wrote:
>
> Subject: s390/vfio-ap: fix hang during removal of mdev after duplicate
> assignment
>
> It would have made sense to do it this way in the first place, even
> if the link code were to take care of the duplicates. It did not really
> make sense to do the whole filtering biz and everything else.
No, it did not; however, nobody caught it in review either. In fact,
this probably should have been done prior to hot plug.
> Maybe we
> should spin the short description and the rest of the commit message so
> it reflects the code more.
I'm not sure what you mean here, are you suggesting the first two
paragraphs should be eliminated?
>
>
>> When the same adapter or domain is assigned more than one time prior to
>> removing the matrix mdev to which it is assigned, the remove operation
>> will hang. The reason is because the same vfio_ap_queue objects with an
>> APQN containing the APID of the adapter or APQI of the domain being
>> assigned will get added to the hashtable that holds them multiple times.
>> This results in the pprev and next pointers of the hlist_node (mdev_qnode
>> field in the vfio_ap_queue object) pointing to the queue object itself.
>> This causes an interminable loop when the mdev is removed and the queue
>> table is iterated to reset the queues.
>>
>> To fix this problem, the assignment operation is bypassed when assigning
>> an adapter or domain if it is already assigned to the matrix mdev.
>>
>> Since it is not necessary to assign a resource already assigned or to
>> unassign a resource that has not been assigned, this patch will bypass
>> all assignment/unassignment operations for an adapter, domain or
>> control domain under these circumstances.
>>
>> Cc: stable@...r.kernel.org
>> Fixes: 771e387d5e79 ("s390/vfio-ap: manage link between queue struct and matrix mdev")
> Not 11cb2419fafe ("s390/vfio-ap: manage link between queue struct and
> matrix mdev")
>
> Is my repo borked?
>
>
>> Reported-by: Matthew Rosato <mjrosato@...ux.ibm.com>
>> Signed-off-by: Tony Krowiak <akrowiak@...ux.ibm.com>
>> ---
>> drivers/s390/crypto/vfio_ap_ops.c | 30 ++++++++++++++++++++++++++++++
>> 1 file changed, 30 insertions(+)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index 6c8c41fac4e1..ee82207b4e60 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -984,6 +984,11 @@ static ssize_t assign_adapter_store(struct device *dev,
>> goto done;
>> }
>>
>> + if (test_bit_inv(apid, matrix_mdev->matrix.apm)) {
>> + ret = count;
>> + goto done;
>> + }
>> +
>> set_bit_inv(apid, matrix_mdev->matrix.apm);
>>
>> ret = vfio_ap_mdev_validate_masks(matrix_mdev);
>> @@ -1109,6 +1114,11 @@ static ssize_t unassign_adapter_store(struct device *dev,
>> goto done;
>> }
>>
>> + if (!test_bit_inv(apid, matrix_mdev->matrix.apm)) {
>> + ret = count;
>> + goto done;
>> + }
>> +
>> clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
>> vfio_ap_mdev_hot_unplug_adapter(matrix_mdev, apid);
>> ret = count;
>> @@ -1183,6 +1193,11 @@ static ssize_t assign_domain_store(struct device *dev,
>> goto done;
>> }
>>
>> + if (test_bit_inv(apqi, matrix_mdev->matrix.aqm)) {
>> + ret = count;
>> + goto done;
>> + }
>> +
>> set_bit_inv(apqi, matrix_mdev->matrix.aqm);
>>
>> ret = vfio_ap_mdev_validate_masks(matrix_mdev);
>> @@ -1286,6 +1301,11 @@ static ssize_t unassign_domain_store(struct device *dev,
>> goto done;
>> }
>>
>> + if (!test_bit_inv(apqi, matrix_mdev->matrix.aqm)) {
>> + ret = count;
>> + goto done;
>> + }
>> +
>> clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
>> vfio_ap_mdev_hot_unplug_domain(matrix_mdev, apqi);
>> ret = count;
>> @@ -1329,6 +1349,11 @@ static ssize_t assign_control_domain_store(struct device *dev,
>> goto done;
>> }
>>
>> + if (test_bit_inv(id, matrix_mdev->matrix.adm)) {
>> + ret = count;
>> + goto done;
>> + }
>> +
>> /* Set the bit in the ADM (bitmask) corresponding to the AP control
>> * domain number (id). The bits in the mask, from most significant to
>> * least significant, correspond to IDs 0 up to the one less than the
>> @@ -1378,6 +1403,11 @@ static ssize_t unassign_control_domain_store(struct device *dev,
>> goto done;
>> }
>>
>> + if (!test_bit_inv(domid, matrix_mdev->matrix.adm)) {
>> + ret = count;
>> + goto done;
>> + }
>> +
>> clear_bit_inv(domid, matrix_mdev->matrix.adm);
>>
>> if (test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
Powered by blists - more mailing lists