[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <65834705-347c-1e8d-f33f-b64bc2501b37@linux.ibm.com>
Date:   Mon, 30 Nov 2020 19:18:30 -0500
From:   Tony Krowiak <akrowiak@...ux.ibm.com>
To:     Halil Pasic <pasic@...ux.ibm.com>
Cc:     linux-s390@...r.kernel.org, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org, freude@...ux.ibm.com, borntraeger@...ibm.com,
        cohuck@...hat.com, mjrosato@...ux.ibm.com,
        alex.williamson@...hat.com, kwankhede@...dia.com,
        fiuczy@...ux.ibm.com, frankja@...ux.ibm.com, david@...hat.com,
        hca@...ux.ibm.com, gor@...ux.ibm.com
Subject: Re: [PATCH v12 12/17] s390/vfio-ap: allow hot plug/unplug of AP
 resources using mdev device
On 11/30/20 6:32 PM, Halil Pasic wrote:
> On Mon, 30 Nov 2020 14:36:10 -0500
> Tony Krowiak <akrowiak@...ux.ibm.com> wrote:
>
>>
>> On 11/28/20 8:52 PM, Halil Pasic wrote:
> [..]
>>>> * Unassign adapter from mdev's matrix:
>>>>
>>>>     The domain will be hot unplugged from the KVM guest if it is
>>>>     assigned to the guest's matrix.
>>>>
>>>> * Assign a control domain:
>>>>
>>>>     The control domain will be hot plugged into the KVM guest if it is not
>>>>     assigned to the guest's APCB. The AP architecture ensures a guest will
>>>>     only get access to the control domain if it is in the host's AP
>>>>     configuration, so there is no risk in hot plugging it; however, it will
>>>>     become automatically available to the guest when it is added to the host
>>>>     configuration.
>>>>
>>>> * Unassign a control domain:
>>>>
>>>>     The control domain will be hot unplugged from the KVM guest if it is
>>>>     assigned to the guest's APCB.
>>> This is where things start getting tricky. E.g. do we need to revise
>>> filtering after an unassign? (For example an assign_adapter X didn't
>>> change the shadow, because queue XY was missing, but now we unplug domain
>>> Y. Should the adapter X pop up? I guess it should.)
>> I suppose that makes sense at the expense of making the code
>> more complex. It is essentially what we had in the prior version
>> which used the same filtering code for assignment as well as
>> host AP configuration changes.
>>
> Will have to think about it some more. Making the user unplug and
> replug an adapter because at some point it got filtered, but there
> is no need to filter it does not feel right. On the other hand, I'm
> afraid I'm complaining in circles.
>
>>>
>>>> Note: Now that hot plug/unplug is implemented, there is the possibility
>>>>         that an assignment/unassignment of an adapter, domain or control
>>>>         domain could be initiated while the guest is starting, so the
>>>>         matrix device lock will be taken for the group notification callback
>>>>         that initializes the guest's APCB when the KVM pointer is made
>>>>         available to the vfio_ap device driver.
>>>>
>>>> Signed-off-by: Tony Krowiak <akrowiak@...ux.ibm.com>
>>>> ---
>>>>    drivers/s390/crypto/vfio_ap_ops.c | 190 +++++++++++++++++++++++++-----
>>>>    1 file changed, 159 insertions(+), 31 deletions(-)
>>>>
>>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>>>> index 586ec5776693..4f96b7861607 100644
>>>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>>> @@ -631,6 +631,60 @@ static void vfio_ap_mdev_manage_qlinks(struct ap_matrix_mdev *matrix_mdev,
>>>>    	}
>>>>    }
>>>>    
>>>> +static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev *matrix_mdev,
>>>> +					unsigned long apid)
>>>> +{
>>>> +	unsigned long apqi, apqn;
>>>> +	unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
>>>> +
>>>> +	/*
>>>> +	 * If the APID is already assigned to the guest's shadow APCB, there is
>>>> +	 * no need to assign it.
>>>> +	 */
>>>> +	if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
>>>> +		return false;
>>>> +
>>>> +	/*
>>>> +	 * If no domains have yet been assigned to the shadow APCB and one or
>>>> +	 * more domains have been assigned to the matrix mdev, then use
>>>> +	 * the domains assigned to the matrix mdev; otherwise, there is nothing
>>>> +	 * to assign to the shadow APCB.
>>>> +	 */
>>>> +	if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS)) {
>>>> +		if (bitmap_empty(matrix_mdev->matrix.aqm, AP_DOMAINS))
>>>> +			return false;
>>>> +
>>>> +		aqm = matrix_mdev->matrix.aqm;
>>>> +	}
>>>> +
>>>> +	/* Make sure all APQNs are bound to the vfio_ap driver */
>>>> +	for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
>>>> +		apqn = AP_MKQID(apid, apqi);
>>>> +
>>>> +		if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
>>>> +			return false;
>>>> +	}
>>>> +
>>>> +	set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
>>>> +
>>>> +	/*
>>>> +	 * If we verified APQNs using the domains assigned to the matrix mdev,
>>>> +	 * then copy the APQIs of those domains into the guest's APCB
>>>> +	 */
>>>> +	if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
>>>> +		bitmap_copy(matrix_mdev->shadow_apcb.aqm,
>>>> +			    matrix_mdev->matrix.aqm, AP_DOMAINS);
>>>> +
>>>> +	return true;
>>>> +}
>>> What is the rationale behind the shadow aqm empty special handling?
>> The rationale was to avoid taking the VCPUs
>> out of SIE in order to make an update to the guest's APCB
>> unnecessarily. For example, suppose the guest is started
>> without access to any APQNs (i.e., all matrix and shadow_apcb
>> masks are zeros). Now suppose the administrator proceeds to
>> start assigning AP resources to the mdev. Let's say he starts
>> by assigning adapters 1 through 100. The code below will return
>> true indicating the shadow_apcb was updated. Consequently,
>> the calling code will commit the changes to the guest's
>> APCB. The problem there is that in order to update the guest's
>> VCPUs, they will have to be taken out of SIE, yet the guest will
>> not get access to the adapter since no domains have yet been
>> assigned to the APCB. Doing this 100 times - once for each
>> adapter 1-100 - is probably a bad idea.
>>
> Not yanking the VCPUs out of SIE does make a lot of sense. At least
> I understand your motivation now. I will think some more about this,
> but in the meanwhile, please try to answer one more question (see
> below).
>   
>>>    I.e.
>>> why not simply:
>>>
>>>
>>> static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev *matrix_mdev,
>>>                                           unsigned long apid)
>>> {
>>>           unsigned long apqi, apqn;
>>>           unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
>>>                                                                                   
>>>           /*
>>>            * If the APID is already assigned to the guest's shadow APCB, there is
>>>            * no need to assign it.
>>>            */
>>>           if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
>>>                   return false;
>>>                                                                                   
>>>           /* Make sure all APQNs are bound to the vfio_ap driver */
>>>           for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
>>>                   apqn = AP_MKQID(apid, apqi);
>>>                                                                                   
>>>                   if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
>>>                           return false;
>>>           }
>>>                                                                                   
>>>           set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
>>>                                                                                   
>>>           return true;
> Would
> s/return true/return !bitmap_empty(matrix_mdev->shadow_apcb.aqm,
> AP_DOMAINS)/
> do the trick?
>
> I mean if empty, then we would not commit the APCB, so we would
> not take the vCPUs out of SIE -- see below.
At first glance I'd say yes, it does the trick; but, I need to consider
all possible scenarios. For example, that will work fine when someone
either assigns all of the adapters or all of the domains first, then assigns
the other.
>
>>>> +
>>>> +static void vfio_ap_mdev_hot_plug_adapter(struct ap_matrix_mdev *matrix_mdev,
>>>> +					  unsigned long apid)
>>>> +{
>>>> +	if (vfio_ap_assign_apid_to_apcb(matrix_mdev, apid))
>>>> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>>>> +}
>>>> +
> [..]
>
> Regards,
> Halil
Powered by blists - more mailing lists
 
