[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bbbd941f-16fa-5276-093e-22d1dad42593@linux.ibm.com>
Date: Mon, 5 Oct 2020 19:05:08 -0400
From: Tony Krowiak <akrowiak@...ux.ibm.com>
To: Halil Pasic <pasic@...ux.ibm.com>
Cc: linux-s390@...r.kernel.org, linux-kernel@...r.kernel.org,
kvm@...r.kernel.org, freude@...ux.ibm.com, borntraeger@...ibm.com,
cohuck@...hat.com, mjrosato@...ux.ibm.com,
alex.williamson@...hat.com, kwankhede@...dia.com,
fiuczy@...ux.ibm.com, frankja@...ux.ibm.com, david@...hat.com,
imbrenda@...ux.ibm.com, hca@...ux.ibm.com, gor@...ux.ibm.com
Subject: Re: [PATCH v10 11/16] s390/vfio-ap: allow hot plug/unplug of AP
resources using mdev device
I proposed two algorithms in my last response. The following summarizes the
results from executing your scenario:
bound queues:
0.0
0.1
1.0
2.0
2.1
algorithm: use filtering on assign/unassign
scenario:
echo 0 > assign_domain
echo 1 > assign_domain
echo 1 > assign_adapter
matrix:
1.0
1.1
guest_matrix:
1.0
echo 0 > assign_adapter
matrix:
0.0
0.1
1.0
1.1
guest_matrix:
0.0
0.1
echo 1 > unassign_adapter
0.0
0.1
guest_matrix:
0.0
0.1
echo 2 > assign_adapter
matrix:
0.0
0.1
2.0
2.1
guest_matrix:
0.0
0.1
2.0
2.1
echo 1 > assign_adapter
matrix:
0.0
0.1
1.0
1.1
2.0
2.1
guest_matrix:
0.0
0.1
2.0
2.1
algorithm: do not plug adapter if all assigned APQNs are not bound
scenario:
echo 0 > assign_domain
echo 1 > assign_domain
echo 1 > assign_adapter
matrix:
1.0
1.1
guest_matrix:
no bits set
echo 0 > assign_adapter
matrix:
0.0
0.1
1.0
1.1
guest_matrix:
0.0
0.1
echo 1 > unassign_adapter
0.0
0.1
guest_matrix:
0.0
0.1
echo 2 > assign_adapter
matrix:
0.0
0.1
2.0
2.1
guest_matrix:
0.0
0.1
2.0
2.1
echo 1 > assign_adapter
matrix:
0.0
0.1
1.0
1.1
2.0
2.1
guest_matrix:
0.0
0.1
2.0
2.1
On 10/5/20 2:30 PM, Halil Pasic wrote:
> On Mon, 5 Oct 2020 12:24:39 -0400
> Tony Krowiak <akrowiak@...ux.ibm.com> wrote:
>
>>
>> On 9/27/20 9:01 PM, Halil Pasic wrote:
>>> On Fri, 21 Aug 2020 15:56:11 -0400
>>> Tony Krowiak<akrowiak@...ux.ibm.com> wrote:
>>>
>>>> Let's hot plug/unplug adapters, domains and control domains assigned to or
>>>> unassigned from an AP matrix mdev device while it is in use by a guest per
>>>> the following:
>>>>
>>>> * When the APID of an adapter is assigned to a matrix mdev in use by a KVM
>>>> guest, the adapter will be hot plugged into the KVM guest as long as each
>>>> APQN derived from the Cartesian product of the APID being assigned and
>>>> the APQIs already assigned to the guest's CRYCB references a queue device
>>>> bound to the vfio_ap device driver.
>>>>
>>>> * When the APID of an adapter is unassigned from a matrix mdev in use by a
>>>> KVM guest, the adapter will be hot unplugged from the KVM guest.
>>>>
>>>> * When the APQI of a domain is assigned to a matrix mdev in use by a KVM
>>>> guest, the domain will be hot plugged into the KVM guest as long as each
>>>> APQN derived from the Cartesian product of the APQI being assigned and
>>>> the APIDs already assigned to the guest's CRYCB references a queue device
>>>> bound to the vfio_ap device driver.
>>>>
>>>> * When the APQI of a domain is unassigned from a matrix mdev in use by a
>>>> KVM guest, the domain will be hot unplugged from the KVM guest
>>> Hm, I suppose this means that what your guest effectively gets may depend
>>> on whether assign_domain or assign_adapter is done first.
>>>
>>> Suppose we have the queues
>>> 0.0 0.1
>>> 1.0
>>> bound to vfio_ap, i.e. 1.1 is missing for a reason different than
>>> belonging to the default drivers (for what exact reason no idea).
>> I'm not quite sure what you mean be "we have queue". I will
>> assume you mean those queues are bound to the vfio_ap
>> device driver.
> Yes, this is exactly what I've meant.
>
>
>> The only way this could happen is if somebody
>> manually unbinds queue 1.1.
>>
> Assuming that:
> 1) every time we observe ap_perm the ap subsystem in in a settled state
> (i.e. not in a middle of pushing things left and right
> because of an ap_perm change,
> 2) the only non-default driver is vfio_ap, and that
> 3) queues handle non-operational states by other means than dissapearing
> (should be the case with the latest reworks)
> I agree what is left is manual unbind, which I lean towards considering
> an edge case.
>
> If this is indeed just about that edge case, maybe we can live with a
> simpler algorithm than this one.
>
>
>>> Let's suppose we started with the matix containing only adapter
>>> 0 (0.) and domain 0 (.0).
>>>
>>> After echo 1 > assign_adapter && echo 1 > assign_domain we end up with
>>> matrix:
>>> 0.0 0.1
>>> 1.0 1.1
>>> guest_matrix:
>>> 0.0 0.1
>>> while after echo 1 > assign_domain && echo 1 > assign_adapter we end up
>>> with:
>>> matrix:
>>> 0.0 0.1
>>> 1.0 1.1
>>> guest_matrix:
>>> 0.0
>>> 0.1
>>>
>>> That means, the set of bound queues and the set of assigned resources do
>>> not fully determine the set of resources passed through to the guest.
>>>
>>> Is that a deliberate design choice?
>> Yes, it is a deliberate choice to only allow guest access to queues
>> represented by queue devices bound to the vfio_ap device driver.
>> The idea here is to adhere to the linux device model.
>>
> This is not what I've asked. My question was about he fact that
> reordering assignments gives different results. Well this was kind
> of the case before as well, with the notable difference, that in a
> past we always had an error. So if a full sequence of assignments could
> be performed without an error, than any permutation would be performed
> with the exact same result.
>
> I'm all for only allowing guest access to queues represented by queue
> devices bound to the vfio_ap device driver. I'm concerned with the
> permutation (and calculus).
>
>>>> * When the domain number of a control domain is assigned to a matrix mdev
>>>> in use by a KVM guest, the control domain will be hot plugged into the
>>>> KVM guest.
>>>>
>>>> * When the domain number of a control domain is unassigned from a matrix
>>>> mdev in use by a KVM guest, the control domain will be hot unplugged
>>>> from the KVM guest.
>>>>
>>>> Signed-off-by: Tony Krowiak<akrowiak@...ux.ibm.com>
>>>> ---
> [..]
>
>>>> +static bool vfio_ap_mdev_assign_guest_apid(struct ap_matrix_mdev *matrix_mdev,
>>>> + unsigned long apid)
>>>> +{
>>>> + unsigned long apqi, apqn;
>>>> +
>>>> + if (!vfio_ap_mdev_has_crycb(matrix_mdev) ||
>>>> + !test_bit_inv(apid, (unsigned long *)matrix_dev->info.apm))
>>>> + return false;
>>>> +
>>>> + if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
>>>> + return vfio_ap_mdev_assign_apqis_4_apid(matrix_mdev, apid);
>>> Hm. Let's say we have the same situation regarding the bound queues as
>>> above but we start with the empty matrix, and do all the assignments
>>> while the guest is running.
>>>
>>> Consider the following sequence of actions.
>>>
>>> 1) echo 0 > assign_domain
>> matrix: .0
>> guest_matrix: no APQNs
>>
>>> 2) echo 1 > assign_domain
>> matrix: .0, .1
>> guest_matrix: no APQNs
>>
>>> 3) echo 1 > assign_adapter
>> matrix: 1.0, 1.1
>> guest_matrix: 1.0
>>
>>> 4) echo 0 > assign_adapter
>> matrix: 0.0, 0.1, 1.0, 1.1
>> guest_matrix: 0.0, 1.0
>>> 5) echo 1 > unassign_adapter
>> matrix: 0.0, 0.1
>> guest_matrix: 0.0
>>
>>> I understand that at 3), because
>>> bitmap_empty(matrix_mdev->shadow_apcb.aqm)we would end up with a shadow
>>> aqm containing just domain 0, as queue 1.1 ain't bound to us.
>> True
>>
>>> Thus at the end we would have
>>> matrix:
>>> 0.0 0.1
>>> guest_matrix:
>>> 0.0
>> At the end I had:
>> matrix: 0.0, 0.1
>> guest_matrix: 0.0
>>
>>> And if we add in an adapter 2. into the mix with the queues 2.0 and 2.1
>>> then after
>>> 6) echo 2 > assign_adapter
>>> we get
>>> Thus at the end we would have
>>> matrix:
>>> 0.0 0.1
>>> 2.0 2.1
>>> guest_matrix:
>>> 0.0
>>> 2.0
>>>
>>> This looks very quirky to me. Did I read the code wrong? Opinions?
>> You read the code correctly and I agree, this is a bit quirky. I would say
>> that after adding adapter 2, we should end up with guest matrix:
>> 0.0, 0.1
>> 2.0, 2.1
>>
>> If you agree, I'll make the adjustment.
>>
> I do agree, but maybe we should discuss what adjustments do you have in
> mind.
>
> [..]
>
>>>> +static bool vfio_ap_mdev_unassign_guest_apid(struct ap_matrix_mdev *matrix_mdev,
>>>> + unsigned long apid)
>>>> +{
>>>> + if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
>>>> + if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm)) {
>>>> + clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
>>>> +
>>>> + /*
>>>> + * If there are no APIDs assigned to the guest, then
>>>> + * the guest will not have access to any queues, so
>>>> + * let's also go ahead and unassign the APQIs. Keeping
>>>> + * them around may yield unpredictable results during
>>>> + * a probe that is not related to a host AP
>>>> + * configuration change (i.e., an AP adapter is
>>>> + * configured online).
>>>> + */
>>> I don't quite understand this comment. Clearing out the other mask when
>>> the other one becomes empty, does allow us to recover the full possible guest
>>> matrix in the scenario described above. I don't see any shadow
>>> manipulation in the probe handler at this stage. Are we maybe
>>> talking about the same effect as I described for assign?
>> Patch 15/16 is for the probe.
>>
> I still don't understand the logic, but I guess we want to make
> adjustments anyways, so maybe I don't have to.
>
> Regards,
> Halil
Powered by blists - more mailing lists