lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 5 Oct 2020 12:24:39 -0400
From:   Tony Krowiak <akrowiak@...ux.ibm.com>
To:     Halil Pasic <pasic@...ux.ibm.com>
Cc:     linux-s390@...r.kernel.org, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org, freude@...ux.ibm.com, borntraeger@...ibm.com,
        cohuck@...hat.com, mjrosato@...ux.ibm.com,
        alex.williamson@...hat.com, kwankhede@...dia.com,
        fiuczy@...ux.ibm.com, frankja@...ux.ibm.com, david@...hat.com,
        imbrenda@...ux.ibm.com, hca@...ux.ibm.com, gor@...ux.ibm.com
Subject: Re: [PATCH v10 11/16] s390/vfio-ap: allow hot plug/unplug of AP
 resources using mdev device



On 9/27/20 9:01 PM, Halil Pasic wrote:
> On Fri, 21 Aug 2020 15:56:11 -0400
> Tony Krowiak<akrowiak@...ux.ibm.com>  wrote:
>
>> Let's hot plug/unplug adapters, domains and control domains assigned to or
>> unassigned from an AP matrix mdev device while it is in use by a guest per
>> the following:
>>
>> * When the APID of an adapter is assigned to a matrix mdev in use by a KVM
>>    guest, the adapter will be hot plugged into the KVM guest as long as each
>>    APQN derived from the Cartesian product of the APID being assigned and
>>    the APQIs already assigned to the guest's CRYCB references a queue device
>>    bound to the vfio_ap device driver.
>>
>> * When the APID of an adapter is unassigned from a matrix mdev in use by a
>>    KVM guest, the adapter will be hot unplugged from the KVM guest.
>>
>> * When the APQI of a domain is assigned to a matrix mdev in use by a KVM
>>    guest, the domain will be hot plugged into the KVM guest as long as each
>>    APQN derived from the Cartesian product of the APQI being assigned and
>>    the APIDs already assigned to the guest's CRYCB references a queue device
>>    bound to the vfio_ap device driver.
>>
>> * When the APQI of a domain is unassigned from a matrix mdev in use by a
>>    KVM guest, the domain will be hot unplugged from the KVM guest
> Hm, I suppose this means that what your guest effectively gets may depend
> on whether assign_domain or assign_adapter is done first.
>
> Suppose we have the queues
> 0.0 0.1
> 1.0
> bound to vfio_ap, i.e. 1.1 is missing for a reason different than
> belonging to the default drivers (for what exact reason no idea).

I'm not quite sure what you mean be "we have queue". I will
assume you mean those queues are bound to the vfio_ap
device driver. The only way this could happen is if somebody
manually unbinds queue 1.1.

> Let's suppose we started with the matix containing only adapter
> 0 (0.) and domain 0 (.0).
>
> After echo 1 > assign_adapter && echo 1 > assign_domain we end up with
> matrix:
> 0.0 0.1
> 1.0 1.1
> guest_matrix:
> 0.0 0.1
> while after echo 1 > assign_domain && echo 1 > assign_adapter we end up
> with:
> matrix:
> 0.0 0.1
> 1.0 1.1
> guest_matrix:
> 0.0
> 0.1
>
> That means, the set of bound queues and the set of assigned resources do
> not fully determine the set of resources passed through to the guest.
>
> I that a deliberate design choice?

Yes, it is a deliberate choice to only allow guest access to queues
represented by queue devices bound to the vfio_ap device driver.
The idea here is to adhere to the linux device model.

>
>> * When the domain number of a control domain is assigned to a matrix mdev
>>    in use by a KVM guest, the control domain will be hot plugged into the
>>    KVM guest.
>>
>> * When the domain number of a control domain is unassigned from a matrix
>>    mdev in use by a KVM guest, the control domain will be hot unplugged
>>    from the KVM guest.
>>
>> Signed-off-by: Tony Krowiak<akrowiak@...ux.ibm.com>
>> ---
>>   drivers/s390/crypto/vfio_ap_ops.c | 196 ++++++++++++++++++++++++++++++
>>   1 file changed, 196 insertions(+)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index cf3321eb239b..2b01a8eb6ee7 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -731,6 +731,56 @@ static void vfio_ap_mdev_link_queues(struct ap_matrix_mdev *matrix_mdev,
>>   	}
>>   }
>>   
>> +static bool vfio_ap_mdev_assign_apqis_4_apid(struct ap_matrix_mdev *matrix_mdev,
>> +					     unsigned long apid)
>> +{
>> +	DECLARE_BITMAP(aqm, AP_DOMAINS);
>> +	unsigned long apqi, apqn;
>> +
>> +	bitmap_copy(aqm, matrix_mdev->matrix.aqm, AP_DOMAINS);
>> +
>> +	for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) {
>> +		if (!test_bit_inv(apqi,
>> +				  (unsigned long *) matrix_dev->info.aqm))
>> +			clear_bit_inv(apqi, aqm);
>> +
>> +		apqn = AP_MKQID(apid, apqi);
>> +		if (!vfio_ap_get_mdev_queue(matrix_mdev, apqn))
>> +			clear_bit_inv(apqi, aqm);
>> +	}
>> +
>> +	if (bitmap_empty(aqm, AP_DOMAINS))
>> +		return false;
>> +
>> +	set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
>> +	bitmap_copy(matrix_mdev->shadow_apcb.aqm, aqm, AP_DOMAINS);
>> +
>> +	return true;
>> +}
>> +
>> +static bool vfio_ap_mdev_assign_guest_apid(struct ap_matrix_mdev *matrix_mdev,
>> +					   unsigned long apid)
>> +{
>> +	unsigned long apqi, apqn;
>> +
>> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev) ||
>> +	    !test_bit_inv(apid, (unsigned long *)matrix_dev->info.apm))
>> +		return false;
>> +
>> +	if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
>> +		return vfio_ap_mdev_assign_apqis_4_apid(matrix_mdev, apid);
> Hm. Let's say we have the same situation regarding the bound queues as
> above but we start with the empty matrix, and do all the assignments
> while the guest is running.
>
> Consider the following sequence of actions.
>
> 1) echo 0 > assign_domain

matrix:            .0
guest_matrix: no APQNs

> 2) echo 1 > assign_domain

matrix:            .0, .1
guest_matrix: no APQNs

> 3) echo 1 > assign_adapter

matrix:           1.0, 1.1
guest_matrix: 1.0

> 4) echo 0 > assign_adapter

matrix:           0.0, 0.1, 1.0, 1.1
guest_matrix: 0.0, 1.0
> 5) echo 1 > unassign_adapter

matrix:           0.0, 0.1
guest_matrix: 0.0

> I understand that at 3), because
> bitmap_empty(matrix_mdev->shadow_apcb.aqm)we would end up with a shadow
> aqm containing just domain 0, as queue 1.1 ain't bound to us.

True

> Thus at the end we would have
> matrix:
> 0.0 0.1
> guest_matrix:
> 0.0

At the end I had:
matrix:            0.0, 0.1
guest_matrix: 0.0

> And if we add in an adapter 2. into the mix with the queues 2.0 and 2.1
> then after
> 6) echo 2 > assign_adapter
> we get
> Thus at the end we would have
> matrix:
> 0.0 0.1
> 2.0 2.1
> guest_matrix:
> 0.0
> 2.0
>
> This looks very quirky to me. Did I read the code wrong? Opinions?

You read the code correctly and I agree, this is a bit quirky. I would say
that after adding adapter 2, we should end up with guest matrix:
0.0, 0.1
2.0, 2.1

If you agree, I'll make the adjustment.

>
>> +
>> +	for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm, AP_DOMAINS) {
>> +		apqn = AP_MKQID(apid, apqi);
>> +		if (!vfio_ap_get_mdev_queue(matrix_mdev, apqn))
>> +			return false;
>> +	}
>> +
>> +	set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
>> +
>> +	return true;
>> +}
>> +
>>   /**
>>    * assign_adapter_store
>>    *
>> @@ -792,12 +842,42 @@ static ssize_t assign_adapter_store(struct device *dev,
>>   	}
>>   	set_bit_inv(apid, matrix_mdev->matrix.apm);
>>   	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APID, apid);
>> +	if (vfio_ap_mdev_assign_guest_apid(matrix_mdev, apid))
>> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>>   	mutex_unlock(&matrix_dev->lock);
>>   
>>   	return count;
>>   }
>>   static DEVICE_ATTR_WO(assign_adapter);
>>   
>> +static bool vfio_ap_mdev_unassign_guest_apid(struct ap_matrix_mdev *matrix_mdev,
>> +					     unsigned long apid)
>> +{
>> +	if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
>> +		if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm)) {
>> +			clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
>> +
>> +			/*
>> +			 * If there are no APIDs assigned to the guest, then
>> +			 * the guest will not have access to any queues, so
>> +			 * let's also go ahead and unassign the APQIs. Keeping
>> +			 * them around may yield unpredictable results during
>> +			 * a probe that is not related to a host AP
>> +			 * configuration change (i.e., an AP adapter is
>> +			 * configured online).
>> +			 */
> I don't quite understand this comment. Clearing out the other mask when
> the one becomes empty, does allow us to recover the full possible guest
> matrix in the scenario described above. I don't see any shadow
> manipulation in the probe handler at this stage. Are we maybe
> talking about the same effect as I described for assign?

Patch 15/16 is for the probe.

>
> Regards,
> Halil
>
>> +			if (bitmap_empty(matrix_mdev->shadow_apcb.apm,
>> +					 AP_DEVICES))
>> +				bitmap_clear(matrix_mdev->shadow_apcb.aqm, 0,
>> +					     AP_DOMAINS);
>> +
>> +			return true;
>> +		}
>> +	}
>> +
>> +	return false;
>> +}
>> +
>>   /**
>>    * unassign_adapter_store
>>    *
>> @@ -834,12 +914,64 @@ static ssize_t unassign_adapter_store(struct device *dev,
>>   	mutex_lock(&matrix_dev->lock);
>>   	clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
>>   	vfio_ap_mdev_link_queues(matrix_mdev, UNLINK_APID, apid);
>> +	if (vfio_ap_mdev_unassign_guest_apid(matrix_mdev, apid))
>> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>>   	mutex_unlock(&matrix_dev->lock);
>>   
>>   	return count;
>>   }
>>   static DEVICE_ATTR_WO(unassign_adapter);
>>   
>> +static bool vfio_ap_mdev_assign_apids_4_apqi(struct ap_matrix_mdev *matrix_mdev,
>> +					     unsigned long apqi)
>> +{
>> +	DECLARE_BITMAP(apm, AP_DEVICES);
>> +	unsigned long apid, apqn;
>> +
>> +	bitmap_copy(apm, matrix_mdev->matrix.apm, AP_DEVICES);
>> +
>> +	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
>> +		if (!test_bit_inv(apid,
>> +				  (unsigned long *) matrix_dev->info.apm))
>> +			clear_bit_inv(apqi, apm);
>> +
>> +		apqn = AP_MKQID(apid, apqi);
>> +		if (!vfio_ap_get_mdev_queue(matrix_mdev, apqn))
>> +			clear_bit_inv(apid, apm);
>> +	}
>> +
>> +	if (bitmap_empty(apm, AP_DEVICES))
>> +		return false;
>> +
>> +	set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
>> +	bitmap_copy(matrix_mdev->shadow_apcb.apm, apm, AP_DEVICES);
>> +
>> +	return true;
>> +}
>> +
>> +static bool vfio_ap_mdev_assign_guest_apqi(struct ap_matrix_mdev *matrix_mdev,
>> +					   unsigned long apqi)
>> +{
>> +	unsigned long apid, apqn;
>> +
>> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev) ||
>> +	    !test_bit_inv(apqi, (unsigned long *)matrix_dev->info.aqm))
>> +		return false;
>> +
>> +	if (bitmap_empty(matrix_mdev->shadow_apcb.apm, AP_DEVICES))
>> +		return vfio_ap_mdev_assign_apids_4_apqi(matrix_mdev, apqi);
>> +
>> +	for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm, AP_DEVICES) {
>> +		apqn = AP_MKQID(apid, apqi);
>> +		if (!vfio_ap_get_mdev_queue(matrix_mdev, apqn))
>> +			return false;
>> +	}
>> +
>> +	set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
>> +
>> +	return true;
>> +}
>> +
>>   /**
>>    * assign_domain_store
>>    *
>> @@ -901,12 +1033,41 @@ static ssize_t assign_domain_store(struct device *dev,
>>   	}
>>   	set_bit_inv(apqi, matrix_mdev->matrix.aqm);
>>   	vfio_ap_mdev_link_queues(matrix_mdev, LINK_APQI, apqi);
>> +	if (vfio_ap_mdev_assign_guest_apqi(matrix_mdev, apqi))
>> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>>   	mutex_unlock(&matrix_dev->lock);
>>   
>>   	return count;
>>   }
>>   static DEVICE_ATTR_WO(assign_domain);
>>   
>> +static bool vfio_ap_mdev_unassign_guest_apqi(struct ap_matrix_mdev *matrix_mdev,
>> +					     unsigned long apqi)
>> +{
>> +	if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
>> +		if (test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm)) {
>> +			clear_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
>> +
>> +			/*
>> +			 * If there are no APQIs assigned to the guest, then
>> +			 * the guest will not have access to any queues, so
>> +			 * let's also go ahead and unassign the APIDs. Keeping
>> +			 * them around may yield unpredictable results during
>> +			 * a probe that is not related to a host AP
>> +			 * configuration change (i.e., an AP adapter is
>> +			 * configured online).
>> +			 */
>> +			if (bitmap_empty(matrix_mdev->shadow_apcb.aqm,
>> +					 AP_DOMAINS))
>> +				bitmap_clear(matrix_mdev->shadow_apcb.apm, 0,
>> +					     AP_DEVICES);
>> +
>> +			return true;
>> +		}
>> +	}
>> +
>> +	return false;
>> +}
>>   
>>   /**
>>    * unassign_domain_store
>> @@ -944,12 +1105,28 @@ static ssize_t unassign_domain_store(struct device *dev,
>>   	mutex_lock(&matrix_dev->lock);
>>   	clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
>>   	vfio_ap_mdev_link_queues(matrix_mdev, UNLINK_APQI, apqi);
>> +	if (vfio_ap_mdev_unassign_guest_apqi(matrix_mdev, apqi))
>> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>>   	mutex_unlock(&matrix_dev->lock);
>>   
>>   	return count;
>>   }
>>   static DEVICE_ATTR_WO(unassign_domain);
>>   
>> +static bool vfio_ap_mdev_assign_guest_cdom(struct ap_matrix_mdev *matrix_mdev,
>> +					   unsigned long domid)
>> +{
>> +	if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
>> +		if (!test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
>> +			set_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
>> +
>> +			return true;
>> +		}
>> +	}
>> +
>> +	return false;
>> +}
>> +
>>   /**
>>    * assign_control_domain_store
>>    *
>> @@ -984,12 +1161,29 @@ static ssize_t assign_control_domain_store(struct device *dev,
>>   
>>   	mutex_lock(&matrix_dev->lock);
>>   	set_bit_inv(id, matrix_mdev->matrix.adm);
>> +	if (vfio_ap_mdev_assign_guest_cdom(matrix_mdev, id))
>> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>>   	mutex_unlock(&matrix_dev->lock);
>>   
>>   	return count;
>>   }
>>   static DEVICE_ATTR_WO(assign_control_domain);
>>   
>> +static bool
>> +vfio_ap_mdev_unassign_guest_cdom(struct ap_matrix_mdev *matrix_mdev,
>> +				 unsigned long domid)
>> +{
>> +	if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
>> +		if (test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
>> +			clear_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
>> +
>> +			return true;
>> +		}
>> +	}
>> +
>> +	return false;
>> +}
>> +
>>   /**
>>    * unassign_control_domain_store
>>    *
>> @@ -1024,6 +1218,8 @@ static ssize_t unassign_control_domain_store(struct device *dev,
>>   
>>   	mutex_lock(&matrix_dev->lock);
>>   	clear_bit_inv(domid, matrix_mdev->matrix.adm);
>> +	if (vfio_ap_mdev_unassign_guest_cdom(matrix_mdev, domid))
>> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>>   	mutex_unlock(&matrix_dev->lock);
>>   
>>   	return count;
> u

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ