[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <65834705-347c-1e8d-f33f-b64bc2501b37@linux.ibm.com>
Date: Mon, 30 Nov 2020 19:18:30 -0500
From: Tony Krowiak <akrowiak@...ux.ibm.com>
To: Halil Pasic <pasic@...ux.ibm.com>
Cc: linux-s390@...r.kernel.org, linux-kernel@...r.kernel.org,
kvm@...r.kernel.org, freude@...ux.ibm.com, borntraeger@...ibm.com,
cohuck@...hat.com, mjrosato@...ux.ibm.com,
alex.williamson@...hat.com, kwankhede@...dia.com,
fiuczy@...ux.ibm.com, frankja@...ux.ibm.com, david@...hat.com,
hca@...ux.ibm.com, gor@...ux.ibm.com
Subject: Re: [PATCH v12 12/17] s390/vfio-ap: allow hot plug/unplug of AP
resources using mdev device
On 11/30/20 6:32 PM, Halil Pasic wrote:
> On Mon, 30 Nov 2020 14:36:10 -0500
> Tony Krowiak <akrowiak@...ux.ibm.com> wrote:
>
>>
>> On 11/28/20 8:52 PM, Halil Pasic wrote:
> [..]
>>>> * Unassign adapter from mdev's matrix:
>>>>
>>>> The domain will be hot unplugged from the KVM guest if it is
>>>> assigned to the guest's matrix.
>>>>
>>>> * Assign a control domain:
>>>>
>>>> The control domain will be hot plugged into the KVM guest if it is not
>>>> assigned to the guest's APCB. The AP architecture ensures a guest will
>>>> only get access to the control domain if it is in the host's AP
>>>> configuration, so there is no risk in hot plugging it; however, it will
>>>> become automatically available to the guest when it is added to the host
>>>> configuration.
>>>>
>>>> * Unassign a control domain:
>>>>
>>>> The control domain will be hot unplugged from the KVM guest if it is
>>>> assigned to the guest's APCB.
>>> This is where things start getting tricky. E.g. do we need to revise
>>> filtering after an unassign? (For example an assign_adapter X didn't
>>> change the shadow, because queue XY was missing, but now we unplug domain
>>> Y. Should the adapter X pop up? I guess it should.)
>> I suppose that makes sense at the expense of making the code
>> more complex. It is essentially what we had in the prior version
>> which used the same filtering code for assignment as well as
>> host AP configuration changes.
>>
> Will have to think about it some more. Making the user unplug and
> replug an adapter because at some point it got filtered, but there
> is no need to filter it does not feel right. On the other hand, I'm
> afraid I'm complaining in circles.
>
>>>
>>>> Note: Now that hot plug/unplug is implemented, there is the possibility
>>>> that an assignment/unassignment of an adapter, domain or control
>>>> domain could be initiated while the guest is starting, so the
>>>> matrix device lock will be taken for the group notification callback
>>>> that initializes the guest's APCB when the KVM pointer is made
>>>> available to the vfio_ap device driver.
>>>>
>>>> Signed-off-by: Tony Krowiak <akrowiak@...ux.ibm.com>
>>>> ---
>>>> drivers/s390/crypto/vfio_ap_ops.c | 190 +++++++++++++++++++++++++-----
>>>> 1 file changed, 159 insertions(+), 31 deletions(-)
>>>>
>>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>>>> index 586ec5776693..4f96b7861607 100644
>>>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>>> @@ -631,6 +631,60 @@ static void vfio_ap_mdev_manage_qlinks(struct ap_matrix_mdev *matrix_mdev,
>>>> }
>>>> }
>>>>
>>>> +static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev *matrix_mdev,
>>>> + unsigned long apid)
>>>> +{
>>>> + unsigned long apqi, apqn;
>>>> + unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
>>>> +
>>>> + /*
>>>> + * If the APID is already assigned to the guest's shadow APCB, there is
>>>> + * no need to assign it.
>>>> + */
>>>> + if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
>>>> + return false;
>>>> +
>>>> + /*
>>>> + * If no domains have yet been assigned to the shadow APCB and one or
>>>> + * more domains have been assigned to the matrix mdev, then use
>>>> + * the domains assigned to the matrix mdev; otherwise, there is nothing
>>>> + * to assign to the shadow APCB.
>>>> + */
>>>> + if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS)) {
>>>> + if (bitmap_empty(matrix_mdev->matrix.aqm, AP_DOMAINS))
>>>> + return false;
>>>> +
>>>> + aqm = matrix_mdev->matrix.aqm;
>>>> + }
>>>> +
>>>> + /* Make sure all APQNs are bound to the vfio_ap driver */
>>>> + for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
>>>> + apqn = AP_MKQID(apid, apqi);
>>>> +
>>>> + if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
>>>> + return false;
>>>> + }
>>>> +
>>>> + set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
>>>> +
>>>> + /*
>>>> + * If we verified APQNs using the domains assigned to the matrix mdev,
>>>> + * then copy the APQIs of those domains into the guest's APCB
>>>> + */
>>>> + if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
>>>> + bitmap_copy(matrix_mdev->shadow_apcb.aqm,
>>>> + matrix_mdev->matrix.aqm, AP_DOMAINS);
>>>> +
>>>> + return true;
>>>> +}
>>> What is the rationale behind the shadow aqm empty special handling?
>> The rationale was to avoid taking the VCPUs
>> out of SIE in order to make an update to the guest's APCB
>> unnecessarily. For example, suppose the guest is started
>> without access to any APQNs (i.e., all matrix and shadow_apcb
>> masks are zeros). Now suppose the administrator proceeds to
>> start assigning AP resources to the mdev. Let's say he starts
>> by assigning adapters 1 through 100. The code below will return
>> true indicating the shadow_apcb was updated. Consequently,
>> the calling code will commit the changes to the guest's
>> APCB. The problem there is that in order to update the guest's
>> VCPUs, they will have to be taken out of SIE, yet the guest will
>> not get access to the adapter since no domains have yet been
>> assigned to the APCB. Doing this 100 times - once for each
>> adapter 1-100 - is probably a bad idea.
>>
> Not yanking the VCPUs out of SIE does make a lot of sense. At least
> I understand your motivation now. I will think some more about this,
> but in the meanwhile, please try to answer one more question (see
> below).
>
>>> I.e.
>>> why not simply:
>>>
>>>
>>> static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev *matrix_mdev,
>>> unsigned long apid)
>>> {
>>> unsigned long apqi, apqn;
>>> unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
>>>
>>> /*
>>> * If the APID is already assigned to the guest's shadow APCB, there is
>>> * no need to assign it.
>>> */
>>> if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
>>> return false;
>>>
>>> /* Make sure all APQNs are bound to the vfio_ap driver */
>>> for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
>>> apqn = AP_MKQID(apid, apqi);
>>>
>>> if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
>>> return false;
>>> }
>>>
>>> set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
>>>
>>> return true;
> Would
> s/return true/return !bitmap_empty(matrix_mdev->shadow_apcb.aqm,
> AP_DOMAINS)/
> do the trick?
>
> I mean if empty, then we would not commit the APCB, so we would
> not take the vCPUs out of SIE -- see below.
At first glance I'd say yes, it does the trick; but, I need to consider
all possible scenarios. For example, that will work fine when someone
either assigns all of the adapters or all of the domains first, then assigns
the other.
>
>>>> +
>>>> +static void vfio_ap_mdev_hot_plug_adapter(struct ap_matrix_mdev *matrix_mdev,
>>>> + unsigned long apid)
>>>> +{
>>>> + if (vfio_ap_assign_apid_to_apcb(matrix_mdev, apid))
>>>> + vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>>>> +}
>>>> +
> [..]
>
> Regards,
> Halil
Powered by blists - more mailing lists