linux-kernel - Re: [PATCH v2 1/7] s390: vfio-ap: wait for queue empty on queue reset

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <2549bc2f-4e5a-cc24-b976-f771a7243cc8@linux.ibm.com>
Date:   Tue, 7 May 2019 11:12:22 -0400
From:   Tony Krowiak <akrowiak@...ux.ibm.com>
To:     pmorel@...ux.ibm.com, linux-s390@...r.kernel.org,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Cc:     freude@...ux.ibm.com, borntraeger@...ibm.com, cohuck@...hat.com,
        frankja@...ux.ibm.com, david@...hat.com, schwidefsky@...ibm.com,
        heiko.carstens@...ibm.com, pasic@...ux.ibm.com,
        alex.williamson@...hat.com, kwankhede@...dia.com
Subject: Re: [PATCH v2 1/7] s390: vfio-ap: wait for queue empty on queue reset

On 5/7/19 4:10 AM, Pierre Morel wrote:
> On 06/05/2019 21:37, Tony Krowiak wrote:
>> On 5/6/19 2:41 AM, Pierre Morel wrote:
>>> On 03/05/2019 23:14, Tony Krowiak wrote:
>>>> Refactors the AP queue reset function to wait until the queue is empty
>>>> after the PQAP(ZAPQ) instruction is executed to zero out the queue as
>>>> required by the AP architecture.
>>>>
>>>> Signed-off-by: Tony Krowiak <akrowiak@...ux.ibm.com>
>>>> ---
>>>>   drivers/s390/crypto/vfio_ap_ops.c | 35 
>>>> ++++++++++++++++++++++++++++++++---
>>>>   1 file changed, 32 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
>>>> b/drivers/s390/crypto/vfio_ap_ops.c
>>>> index 900b9cf20ca5..b88a2a2ba075 100644
>>>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>>> @@ -271,6 +271,32 @@ static int 
>>>> vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
>>>>       return 0;
>>>>   }
>>>> +static void vfio_ap_mdev_wait_for_qempty(unsigned long apid, 
>>>> unsigned long apqi)
>>>> +{
>>>> +    struct ap_queue_status status;
>>>> +    ap_qid_t qid = AP_MKQID(apid, apqi);
>>>> +    int retry = 5;
>>>> +
>>>> +    do {
>>>> +        status = ap_tapq(qid, NULL);
>>>> +        switch (status.response_code) {
>>>> +        case AP_RESPONSE_NORMAL:
>>>> +            if (status.queue_empty)
>>>> +                return;
>>>> +            msleep(20);
>>>
>>> NIT:     Fall through ?
>>
>> Yes
>>
>>>
>>>> +            break;
>>>> +        case AP_RESPONSE_RESET_IN_PROGRESS:
>>>> +        case AP_RESPONSE_BUSY:
>>>> +            msleep(20);
>>>> +            break;
>>>> +        default:
>>>> +            pr_warn("%s: tapq err %02x: %04lx.%02lx may not be 
>>>> empty\n",
>>>> +                __func__, status.response_code, apid, apqi);
>>>
>>> I do not thing the warning sentence is appropriate:
>>> The only possible errors here are if the AP is not available due to 
>>> AP checkstop, deconfigured AP or invalid APQN.
>>
>> Right you are! I'll work on a new message.
>>
>>>
>>>
>>>> +            return;
>>>> +        }
>>>> +    } while (--retry);
>>>> +}
>>>> +
>>>>   /**
>>>>    * assign_adapter_store
>>>>    *
>>>> @@ -790,15 +816,18 @@ static int vfio_ap_mdev_group_notifier(struct 
>>>> notifier_block *nb,
>>>>       return NOTIFY_OK;
>>>>   }
>>>> -static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int 
>>>> apqi,
>>>> -                    unsigned int retry)
>>>> +int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi)
>>>>   {
>>>>       struct ap_queue_status status;
>>>> +    int retry = 5;
>>>>       do {
>>>>           status = ap_zapq(AP_MKQID(apid, apqi));
>>>>           switch (status.response_code) {
>>>>           case AP_RESPONSE_NORMAL:
>>>> +            vfio_ap_mdev_wait_for_qempty(apid, apqi);
>>>> +            return 0;
>>>> +        case AP_RESPONSE_DECONFIGURED:
>>>
>>> Since you modify the switch, you can return for all the following cases:
>>> AP_RESPONSE_DECONFIGURE
>>> ..._CHECKSTOP
>>> ..._INVALID_APQN
>>>
>>>
>>> And you should wait for qempty on AP_RESET_IN_PROGRESS along with 
>>> AP_RESPONSE_NORMAL
>>
>> If a queue reset is in progress, we retry the zapq. Are you saying we
>> should wait for qempty then reissue the zapq?
> 
> 
> Yes, I fear that if we reissue the zapq while RESET is in progress we 
> could fall in a loop depending on the reset hardware time and the 
> software retry .

I already did this in the forthcoming v4 series.

> 
>>
>>>
>>>>               return 0;
>>>>           case AP_RESPONSE_RESET_IN_PROGRESS:
>>>>           case AP_RESPONSE_BUSY:
>>>
>>> While at modifying this function, the AP_RESPONSE_BUSY is not a valid 
>>> code for ZAPQ, you can remove this.
>>
>> Okay
>>
>>>
>>>> @@ -824,7 +853,7 @@ static int vfio_ap_mdev_reset_queues(struct 
>>>> mdev_device *mdev)
>>>>                    matrix_mdev->matrix.apm_max + 1) {
>>>>           for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
>>>>                        matrix_mdev->matrix.aqm_max + 1) {
>>>> -            ret = vfio_ap_mdev_reset_queue(apid, apqi, 1);
>>>> +            ret = vfio_ap_mdev_reset_queue(apid, apqi);
>>>
>>> IMHO, since you are at changing this call, passing the apqn as 
>>> parameter would be a good simplification.
>>
>> Okay.
> 
> Sorry, I should have add: NIT.
> 
>>
>>>
>>>
>>>
>>>>               /*
>>>>                * Regardless whether a queue turns out to be busy, or
>>>>                * is not operational, we need to continue resetting
>>>
>>> Depends on why the reset failed, but this is out of scope.
>>
>> I'm not sure what you mean by out of scope here, but you do make a valid
>> point. If the response code for the zapq is AP_RESPONSE_DECONFIGURED,
>> there is probably no sense in continuing to reset queues for that
>> particular adapter. I'll consider a change here.
> 
> Yes, this was the point, but I consider this as a enhancement, trying a 
> reset on bad queues AFAIK do no arm.

I included the enhancement in the forthcoming v4 series.

> 
>>
>>>
>>>>
>>>
>>>
>>
> 
>