lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bbbd941f-16fa-5276-093e-22d1dad42593@linux.ibm.com>
Date:   Mon, 5 Oct 2020 19:05:08 -0400
From:   Tony Krowiak <akrowiak@...ux.ibm.com>
To:     Halil Pasic <pasic@...ux.ibm.com>
Cc:     linux-s390@...r.kernel.org, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org, freude@...ux.ibm.com, borntraeger@...ibm.com,
        cohuck@...hat.com, mjrosato@...ux.ibm.com,
        alex.williamson@...hat.com, kwankhede@...dia.com,
        fiuczy@...ux.ibm.com, frankja@...ux.ibm.com, david@...hat.com,
        imbrenda@...ux.ibm.com, hca@...ux.ibm.com, gor@...ux.ibm.com
Subject: Re: [PATCH v10 11/16] s390/vfio-ap: allow hot plug/unplug of AP
 resources using mdev device

I proposed two algorithms in my last response. The following summarizes the
results from executing your scenario:

bound queues:
0.0
0.1

1.0

2.0
2.1

algorithm: use filtering on assign/unassign
scenario:
echo 0 > assign_domain
echo 1 > assign_domain
echo 1 > assign_adapter

matrix:
1.0
1.1
guest_matrix:
1.0

echo 0 > assign_adapter

matrix:
0.0
0.1
1.0
1.1
guest_matrix:
0.0
0.1

echo 1 > unassign_adapter
0.0
0.1
guest_matrix:
0.0
0.1

echo 2 > assign_adapter

matrix:
0.0
0.1
2.0
2.1
guest_matrix:
0.0
0.1
2.0
2.1

echo 1 > assign_adapter

matrix:
0.0
0.1
1.0
1.1
2.0
2.1
guest_matrix:
0.0
0.1
2.0
2.1

algorithm: do not plug adapter if all assigned APQNs are not bound
scenario:
echo 0 > assign_domain
echo 1 > assign_domain
echo 1 > assign_adapter

matrix:
1.0
1.1
guest_matrix:
no bits set

echo 0 > assign_adapter

matrix:
0.0
0.1
1.0
1.1
guest_matrix:
0.0
0.1

echo 1 > unassign_adapter
0.0
0.1
guest_matrix:
0.0
0.1

echo 2 > assign_adapter

matrix:
0.0
0.1
2.0
2.1
guest_matrix:
0.0
0.1
2.0
2.1

echo 1 > assign_adapter

matrix:
0.0
0.1
1.0
1.1
2.0
2.1
guest_matrix:
0.0
0.1
2.0
2.1

On 10/5/20 2:30 PM, Halil Pasic wrote:
> On Mon, 5 Oct 2020 12:24:39 -0400
> Tony Krowiak <akrowiak@...ux.ibm.com> wrote:
>
>>
>> On 9/27/20 9:01 PM, Halil Pasic wrote:
>>> On Fri, 21 Aug 2020 15:56:11 -0400
>>> Tony Krowiak<akrowiak@...ux.ibm.com>  wrote:
>>>
>>>> Let's hot plug/unplug adapters, domains and control domains assigned to or
>>>> unassigned from an AP matrix mdev device while it is in use by a guest per
>>>> the following:
>>>>
>>>> * When the APID of an adapter is assigned to a matrix mdev in use by a KVM
>>>>     guest, the adapter will be hot plugged into the KVM guest as long as each
>>>>     APQN derived from the Cartesian product of the APID being assigned and
>>>>     the APQIs already assigned to the guest's CRYCB references a queue device
>>>>     bound to the vfio_ap device driver.
>>>>
>>>> * When the APID of an adapter is unassigned from a matrix mdev in use by a
>>>>     KVM guest, the adapter will be hot unplugged from the KVM guest.
>>>>
>>>> * When the APQI of a domain is assigned to a matrix mdev in use by a KVM
>>>>     guest, the domain will be hot plugged into the KVM guest as long as each
>>>>     APQN derived from the Cartesian product of the APQI being assigned and
>>>>     the APIDs already assigned to the guest's CRYCB references a queue device
>>>>     bound to the vfio_ap device driver.
>>>>
>>>> * When the APQI of a domain is unassigned from a matrix mdev in use by a
>>>>     KVM guest, the domain will be hot unplugged from the KVM guest
>>> Hm, I suppose this means that what your guest effectively gets may depend
>>> on whether assign_domain or assign_adapter is done first.
>>>
>>> Suppose we have the queues
>>> 0.0 0.1
>>> 1.0
>>> bound to vfio_ap, i.e. 1.1 is missing for a reason different than
>>> belonging to the default drivers (for what exact reason no idea).
>> I'm not quite sure what you mean be "we have queue". I will
>> assume you mean those queues are bound to the vfio_ap
>> device driver.
> Yes, this is exactly what I've meant.
>
>
>> The only way this could happen is if somebody
>> manually unbinds queue 1.1.
>>
> Assuming that:
> 1) every time we observe ap_perm the ap subsystem in in a settled state
> (i.e. not in a middle of pushing things left and right
> because of an ap_perm change,
> 2) the only non-default driver is vfio_ap, and that
> 3) queues handle non-operational states by other means than dissapearing
> (should be the case with the latest reworks)
> I agree what is left is manual unbind, which I lean towards considering
> an edge case.
>
> If this is indeed just about that edge case, maybe we can live with a
> simpler algorithm than this one.
>
>
>>> Let's suppose we started with the matix containing only adapter
>>> 0 (0.) and domain 0 (.0).
>>>
>>> After echo 1 > assign_adapter && echo 1 > assign_domain we end up with
>>> matrix:
>>> 0.0 0.1
>>> 1.0 1.1
>>> guest_matrix:
>>> 0.0 0.1
>>> while after echo 1 > assign_domain && echo 1 > assign_adapter we end up
>>> with:
>>> matrix:
>>> 0.0 0.1
>>> 1.0 1.1
>>> guest_matrix:
>>> 0.0
>>> 0.1
>>>
>>> That means, the set of bound queues and the set of assigned resources do
>>> not fully determine the set of resources passed through to the guest.
>>>
>>> Is that a deliberate design choice?
>> Yes, it is a deliberate choice to only allow guest access to queues
>> represented by queue devices bound to the vfio_ap device driver.
>> The idea here is to adhere to the linux device model.
>>
> This is not what I've asked. My question was about he fact that
> reordering assignments gives different results. Well this was kind
> of the case before as well, with the notable difference, that in a
> past we always had an error. So if a full sequence of assignments could
> be performed without an error, than any permutation would be performed
> with the exact same result.
>
> I'm all for only allowing guest access to queues represented by queue
> devices bound to the vfio_ap device driver. I'm concerned with the
> permutation (and calculus).
>
>>>> * When the domain number of a control domain is assigned to a matrix mdev
>>>>     in use by a KVM guest, the control domain will be hot plugged into the
>>>>     KVM guest.
>>>>
>>>> * When the domain number of a control domain is unassigned from a matrix
>>>>     mdev in use by a KVM guest, the control domain will be hot unplugged
>>>>     from the KVM guest.
>>>>
>>>> Signed-off-by: Tony Krowiak<akrowiak@...ux.ibm.com>
>>>> ---
> [..]
>
>>>> +static bool vfio_ap_mdev_assign_guest_apid(struct ap_matrix_mdev *matrix_mdev,
>>>> +					   unsigned long apid)
>>>> +{
>>>> +	unsigned long apqi, apqn;
>>>> +
>>>> +	if (!vfio_ap_mdev_has_crycb(matrix_mdev) ||
>>>> +	    !test_bit_inv(apid, (unsigned long *)matrix_dev->info.apm))
>>>> +		return false;
>>>> +
>>>> +	if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
>>>> +		return vfio_ap_mdev_assign_apqis_4_apid(matrix_mdev, apid);
>>> Hm. Let's say we have the same situation regarding the bound queues as
>>> above but we start with the empty matrix, and do all the assignments
>>> while the guest is running.
>>>
>>> Consider the following sequence of actions.
>>>
>>> 1) echo 0 > assign_domain
>> matrix:            .0
>> guest_matrix: no APQNs
>>
>>> 2) echo 1 > assign_domain
>> matrix:            .0, .1
>> guest_matrix: no APQNs
>>
>>> 3) echo 1 > assign_adapter
>> matrix:           1.0, 1.1
>> guest_matrix: 1.0
>>
>>> 4) echo 0 > assign_adapter
>> matrix:           0.0, 0.1, 1.0, 1.1
>> guest_matrix: 0.0, 1.0
>>> 5) echo 1 > unassign_adapter
>> matrix:           0.0, 0.1
>> guest_matrix: 0.0
>>
>>> I understand that at 3), because
>>> bitmap_empty(matrix_mdev->shadow_apcb.aqm)we would end up with a shadow
>>> aqm containing just domain 0, as queue 1.1 ain't bound to us.
>> True
>>
>>> Thus at the end we would have
>>> matrix:
>>> 0.0 0.1
>>> guest_matrix:
>>> 0.0
>> At the end I had:
>> matrix:            0.0, 0.1
>> guest_matrix: 0.0
>>
>>> And if we add in an adapter 2. into the mix with the queues 2.0 and 2.1
>>> then after
>>> 6) echo 2 > assign_adapter
>>> we get
>>> Thus at the end we would have
>>> matrix:
>>> 0.0 0.1
>>> 2.0 2.1
>>> guest_matrix:
>>> 0.0
>>> 2.0
>>>
>>> This looks very quirky to me. Did I read the code wrong? Opinions?
>> You read the code correctly and I agree, this is a bit quirky. I would say
>> that after adding adapter 2, we should end up with guest matrix:
>> 0.0, 0.1
>> 2.0, 2.1
>>
>> If you agree, I'll make the adjustment.
>>
> I do agree, but maybe we should discuss what adjustments do you have in
> mind.
>
> [..]
>
>>>> +static bool vfio_ap_mdev_unassign_guest_apid(struct ap_matrix_mdev *matrix_mdev,
>>>> +					     unsigned long apid)
>>>> +{
>>>> +	if (vfio_ap_mdev_has_crycb(matrix_mdev)) {
>>>> +		if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm)) {
>>>> +			clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
>>>> +
>>>> +			/*
>>>> +			 * If there are no APIDs assigned to the guest, then
>>>> +			 * the guest will not have access to any queues, so
>>>> +			 * let's also go ahead and unassign the APQIs. Keeping
>>>> +			 * them around may yield unpredictable results during
>>>> +			 * a probe that is not related to a host AP
>>>> +			 * configuration change (i.e., an AP adapter is
>>>> +			 * configured online).
>>>> +			 */
>>> I don't quite understand this comment. Clearing out the other mask when
>>> the other one becomes empty, does allow us to recover the full possible guest
>>> matrix in the scenario described above. I don't see any shadow
>>> manipulation in the probe handler at this stage. Are we maybe
>>> talking about the same effect as I described for assign?
>> Patch 15/16 is for the probe.
>>
> I still don't understand the logic, but I guess we want to make
> adjustments anyways, so maybe I don't have to.
>
> Regards,
> Halil

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ