[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d6ba4248-77da-4963-5653-1548ced10712@linux.ibm.com>
Date: Mon, 5 Oct 2020 12:24:39 -0400
From: Tony Krowiak <akrowiak@...ux.ibm.com>
To: Halil Pasic <pasic@...ux.ibm.com>
Cc: linux-s390@...r.kernel.org, linux-kernel@...r.kernel.org,
kvm@...r.kernel.org, freude@...ux.ibm.com, borntraeger@...ibm.com,
cohuck@...hat.com, mjrosato@...ux.ibm.com,
alex.williamson@...hat.com, kwankhede@...dia.com,
fiuczy@...ux.ibm.com, frankja@...ux.ibm.com, david@...hat.com,
imbrenda@...ux.ibm.com, hca@...ux.ibm.com, gor@...ux.ibm.com
Subject: Re: [PATCH v10 11/16] s390/vfio-ap: allow hot plug/unplug of AP
resources using mdev device
On 9/27/20 9:01 PM, Halil Pasic wrote:
> On Fri, 21 Aug 2020 15:56:11 -0400
> Tony Krowiak<akrowiak@...ux.ibm.com> wrote:
>
>> Let's hot plug/unplug adapters, domains and control domains assigned to or
>> unassigned from an AP matrix mdev device while it is in use by a guest per
>> the following:
>>
>> * When the APID of an adapter is assigned to a matrix mdev in use by a KVM
>> guest, the adapter will be hot plugged into the KVM guest as long as each
>> APQN derived from the Cartesian product of the APID being assigned and
>> the APQIs already assigned to the guest's CRYCB references a queue device
>> bound to the vfio_ap device driver.
>>
>> * When the APID of an adapter is unassigned from a matrix mdev in use by a
>> KVM guest, the adapter will be hot unplugged from the KVM guest.
>>
>> * When the APQI of a domain is assigned to a matrix mdev in use by a KVM
>> guest, the domain will be hot plugged into the KVM guest as long as each
>> APQN derived from the Cartesian product of the APQI being assigned and
>> the APIDs already assigned to the guest's CRYCB references a queue device
>> bound to the vfio_ap device driver.
>>
>> * When the APQI of a domain is unassigned from a matrix mdev in use by a
>> KVM guest, the domain will be hot unplugged from the KVM guest
> Hm, I suppose this means that what your guest effectively gets may depend
> on whether assign_domain or assign_adapter is done first.
>
> Suppose we have the queues
> 0.0 0.1
> 1.0
> bound to vfio_ap, i.e. 1.1 is missing for a reason different than
> belonging to the default drivers (for what exact reason no idea).
I'm not quite sure what you mean be "we have queue". I will
assume you mean those queues are bound to the vfio_ap
device driver. The only way this could happen is if somebody
manually unbinds queue 1.1.
> Let's suppose we started with the matix containing only adapter
> 0 (0.) and domain 0 (.0).
>
> After echo 1 > assign_adapter && echo 1 > assign_domain we end up with
> matrix:
> 0.0 0.1
> 1.0 1.1
> guest_matrix:
> 0.0 0.1
> while after echo 1 > assign_domain && echo 1 > assign_adapter we end up
> with:
> matrix:
> 0.0 0.1
> 1.0 1.1
> guest_matrix:
> 0.0
> 0.1
>
> That means, the set of bound queues and the set of assigned resources do
> not fully determine the set of resources passed through to the guest.
>
> I that a deliberate design choice?
Yes, it is a deliberate choice to only allow guest access to queues
represented by queue devices bound to the vfio_ap device driver.
The idea here is to adhere to the linux device model.
>
>> * When the domain number of a control domain is assigned to a matrix mdev
>> in use by a KVM guest, the control domain will be hot plugged into the
>> KVM guest.
>>
>> * When the domain number of a control domain is unassigned from a matrix
>> mdev in use by a KVM guest, the control domain will be hot unplugged
>> from the KVM guest.
>>
>> Signed-off-by: Tony Krowiak<akrowiak@...ux.ibm.com>
>> ---
>> drivers/s390/crypto/vfio_ap_ops.c | 196 ++++++++++++++++++++++++++++++
>> 1 file changed, 196 insertions(+)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index cf3321eb239b..2b01a8eb6ee7 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -731,6 +731,56 @@ static void vfio_ap_mdev_link_queues(struct ap_matrix_mdev *matrix_mdev,
>> }
>> }
>>
>> +static bool vfio_ap_mdev_assign_apqis_4_apid(struct ap_matrix_mdev *matrix_mdev,
>> + unsigned long apid)
>> +{
>> + DECLARE_BITMAP(aqm, AP_DOMAINS);
>> + unsigned long apqi, apqn;
>> +
>> + bitmap_copy(aqm, matrix_mdev->matrix.aqm, AP_DOMAINS);
>> +
>> + for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
>> + if (!test_bit_inv(apqi,
>> + (unsigned long *) matrix_dev->info.aqm))
>> + clear_bit_inv(apqi, aqm);
>> +
>> + apqn = AP_MKQID(apid, apqi);
>> + if (!vfio_ap_get_mdev_queue(matrix_mdev, apqn))
>> + clear_bit_inv(apqi, aqm);
>> + }
>> +
>> + if (bitmap_empty(aqm, AP_DOMAINS))
>> + return false;
>> +
>> + set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
>> + bitmap_copy(matrix_mdev->shadow_apcb.aqm, aqm, AP_DOMAINS);
>> +
>> + return true;
>> +}
>> +
>> +static bool vfio_ap_mdev_assign_guest_apid(struct ap_matrix_mdev *matrix_mdev,
>> + unsigned long apid)
>> +{
>> + unsigned long apqi, apqn;
>> +
>> + if (!vfio_ap_mdev_has_crycb(matrix_mdev) ||
>> + !test_bit_inv(apid, (unsigned long *)matrix_dev->info.apm))
>> + return false;
>> +
>> + if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
>> + return vfio_ap_mdev_assign_apqis_4_apid(matrix_mdev, apid);
> Hm. Let's say we have the same situation regarding the bound queues as
> above but we start with the empty matrix, and do all the assignments
> while the guest is running.
>
> Consider the following sequence of actions.
>
> 1) echo 0 > assign_domain
matrix: .0
guest_matrix: no APQNs
> 2) echo 1 > assign_domain
matrix: .0, .1
guest_matrix: no APQNs
> 3) echo 1 > assign_adapter
matrix: 1.0, 1.1
guest_matrix: 1.0
> 4) echo 0 > assign_adapter
matrix: 0.0, 0.1, 1.0, 1.1
guest_matrix: 0.0, 1.0
> 5) echo 1 > unassign_adapter
matrix: 0.0, 0.1
guest_matrix: 0.0
> I understand that at 3), because
> bitmap_empty(matrix_mdev->shadow_apcb.aqm)we would end up with a shadow
> aqm containing just domain 0, as queue 1.1 ain't bound to us.
True
> Thus at the end we would have
> matrix:
> 0.0 0.1
> guest_matrix:
> 0.0
At the end I had:
matrix: 0.0, 0.1
guest_matrix: 0.0
> And if we add in an adapter 2. into the mix with the queues 2.0 and 2.1
> then after
> 6) echo 2 > assign_adapter
> we get
> Thus at the end we would have
> matrix:
> 0.0 0.1
> 2.0 2.1
> guest_matrix:
> 0.0
> 2.0
>
> This looks very quirky to me. Did I read the code wrong? Opinions?
You read the code correctly and I agree, this is a bit quirky. I would say
that after adding adapter 2, we should end up with guest matrix:
0.0, 0.1
2.0, 2.1
If you agree, I'll make the adjustment.
>
>> +
>> + for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm, AP_DOMAINS) {
>> + apqn = AP_MKQID(apid, apqi);
>> + if (!vfio_ap_get_mdev_queue(matrix_mdev, apqn))
>> + return false;
>> + }
>> +
>> + set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
>> +
>> + return true;
>> +}
>> +
>> /**
>> * assign_adapter_store
>> *
>> @@ -792,12 +842,42 @@ static ssize_t assign_adapter_store(struct device *dev,
>> }
>> set_bit_inv(apid, matrix_mdev->matrix.apm);
>> vfio_ap_mdev_link_queues(matrix_mdev, LINK_APID, apid);
>> + if (vfio_ap_mdev_assign_guest_apid(matrix_mdev, apid))
>> + vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>> mutex_unlock(&matrix_dev->lock);
>>
>> return count;
>> }
>> static DEVICE_ATTR_WO(assign_adapter);
>>
>> +static bool vfio_ap_mdev_unassign_guest_apid(struct ap_matrix_mdev *matrix_mdev,
>> + unsigned long apid)
>> +{
>> + if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
>> + if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm)) {
>> + clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
>> +
>> + /*
>> + * If there are no APIDs assigned to the guest, then
>> + * the guest will not have access to any queues, so
>> + * let's also go ahead and unassign the APQIs. Keeping
>> + * them around may yield unpredictable results during
>> + * a probe that is not related to a host AP
>> + * configuration change (i.e., an AP adapter is
>> + * configured online).
>> + */
> I don't quite understand this comment. Clearing out the other mask when
> the one becomes empty, does allow us to recover the full possible guest
> matrix in the scenario described above. I don't see any shadow
> manipulation in the probe handler at this stage. Are we maybe
> talking about the same effect as I described for assign?
Patch 15/16 is for the probe.
>
> Regards,
> Halil
>
>> + if (bitmap_empty(matrix_mdev->shadow_apcb.apm,
>> + AP_DEVICES))
>> + bitmap_clear(matrix_mdev->shadow_apcb.aqm, 0,
>> + AP_DOMAINS);
>> +
>> + return true;
>> + }
>> + }
>> +
>> + return false;
>> +}
>> +
>> /**
>> * unassign_adapter_store
>> *
>> @@ -834,12 +914,64 @@ static ssize_t unassign_adapter_store(struct device *dev,
>> mutex_lock(&matrix_dev->lock);
>> clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
>> vfio_ap_mdev_link_queues(matrix_mdev, UNLINK_APID, apid);
>> + if (vfio_ap_mdev_unassign_guest_apid(matrix_mdev, apid))
>> + vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>> mutex_unlock(&matrix_dev->lock);
>>
>> return count;
>> }
>> static DEVICE_ATTR_WO(unassign_adapter);
>>
>> +static bool vfio_ap_mdev_assign_apids_4_apqi(struct ap_matrix_mdev *matrix_mdev,
>> + unsigned long apqi)
>> +{
>> + DECLARE_BITMAP(apm, AP_DEVICES);
>> + unsigned long apid, apqn;
>> +
>> + bitmap_copy(apm, matrix_mdev->matrix.apm, AP_DEVICES);
>> +
>> + for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
>> + if (!test_bit_inv(apid,
>> + (unsigned long *) matrix_dev->info.apm))
>> + clear_bit_inv(apqi, apm);
>> +
>> + apqn = AP_MKQID(apid, apqi);
>> + if (!vfio_ap_get_mdev_queue(matrix_mdev, apqn))
>> + clear_bit_inv(apid, apm);
>> + }
>> +
>> + if (bitmap_empty(apm, AP_DEVICES))
>> + return false;
>> +
>> + set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
>> + bitmap_copy(matrix_mdev->shadow_apcb.apm, apm, AP_DEVICES);
>> +
>> + return true;
>> +}
>> +
>> +static bool vfio_ap_mdev_assign_guest_apqi(struct ap_matrix_mdev *matrix_mdev,
>> + unsigned long apqi)
>> +{
>> + unsigned long apid, apqn;
>> +
>> + if (!vfio_ap_mdev_has_crycb(matrix_mdev) ||
>> + !test_bit_inv(apqi, (unsigned long *)matrix_dev->info.aqm))
>> + return false;
>> +
>> + if (bitmap_empty(matrix_mdev->shadow_apcb.apm, AP_DEVICES))
>> + return vfio_ap_mdev_assign_apids_4_apqi(matrix_mdev, apqi);
>> +
>> + for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm, AP_DEVICES) {
>> + apqn = AP_MKQID(apid, apqi);
>> + if (!vfio_ap_get_mdev_queue(matrix_mdev, apqn))
>> + return false;
>> + }
>> +
>> + set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
>> +
>> + return true;
>> +}
>> +
>> /**
>> * assign_domain_store
>> *
>> @@ -901,12 +1033,41 @@ static ssize_t assign_domain_store(struct device *dev,
>> }
>> set_bit_inv(apqi, matrix_mdev->matrix.aqm);
>> vfio_ap_mdev_link_queues(matrix_mdev, LINK_APQI, apqi);
>> + if (vfio_ap_mdev_assign_guest_apqi(matrix_mdev, apqi))
>> + vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>> mutex_unlock(&matrix_dev->lock);
>>
>> return count;
>> }
>> static DEVICE_ATTR_WO(assign_domain);
>>
>> +static bool vfio_ap_mdev_unassign_guest_apqi(struct ap_matrix_mdev *matrix_mdev,
>> + unsigned long apqi)
>> +{
>> + if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
>> + if (test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm)) {
>> + clear_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
>> +
>> + /*
>> + * If there are no APQIs assigned to the guest, then
>> + * the guest will not have access to any queues, so
>> + * let's also go ahead and unassign the APIDs. Keeping
>> + * them around may yield unpredictable results during
>> + * a probe that is not related to a host AP
>> + * configuration change (i.e., an AP adapter is
>> + * configured online).
>> + */
>> + if (bitmap_empty(matrix_mdev->shadow_apcb.aqm,
>> + AP_DOMAINS))
>> + bitmap_clear(matrix_mdev->shadow_apcb.apm, 0,
>> + AP_DEVICES);
>> +
>> + return true;
>> + }
>> + }
>> +
>> + return false;
>> +}
>>
>> /**
>> * unassign_domain_store
>> @@ -944,12 +1105,28 @@ static ssize_t unassign_domain_store(struct device *dev,
>> mutex_lock(&matrix_dev->lock);
>> clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
>> vfio_ap_mdev_link_queues(matrix_mdev, UNLINK_APQI, apqi);
>> + if (vfio_ap_mdev_unassign_guest_apqi(matrix_mdev, apqi))
>> + vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>> mutex_unlock(&matrix_dev->lock);
>>
>> return count;
>> }
>> static DEVICE_ATTR_WO(unassign_domain);
>>
>> +static bool vfio_ap_mdev_assign_guest_cdom(struct ap_matrix_mdev *matrix_mdev,
>> + unsigned long domid)
>> +{
>> + if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
>> + if (!test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
>> + set_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
>> +
>> + return true;
>> + }
>> + }
>> +
>> + return false;
>> +}
>> +
>> /**
>> * assign_control_domain_store
>> *
>> @@ -984,12 +1161,29 @@ static ssize_t assign_control_domain_store(struct device *dev,
>>
>> mutex_lock(&matrix_dev->lock);
>> set_bit_inv(id, matrix_mdev->matrix.adm);
>> + if (vfio_ap_mdev_assign_guest_cdom(matrix_mdev, id))
>> + vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>> mutex_unlock(&matrix_dev->lock);
>>
>> return count;
>> }
>> static DEVICE_ATTR_WO(assign_control_domain);
>>
>> +static bool
>> +vfio_ap_mdev_unassign_guest_cdom(struct ap_matrix_mdev *matrix_mdev,
>> + unsigned long domid)
>> +{
>> + if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
>> + if (test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
>> + clear_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
>> +
>> + return true;
>> + }
>> + }
>> +
>> + return false;
>> +}
>> +
>> /**
>> * unassign_control_domain_store
>> *
>> @@ -1024,6 +1218,8 @@ static ssize_t unassign_control_domain_store(struct device *dev,
>>
>> mutex_lock(&matrix_dev->lock);
>> clear_bit_inv(domid, matrix_mdev->matrix.adm);
>> + if (vfio_ap_mdev_unassign_guest_cdom(matrix_mdev, domid))
>> + vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>> mutex_unlock(&matrix_dev->lock);
>>
>> return count;
> u
Powered by blists - more mailing lists