[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2952e19f-2bb4-35d1-b3dd-042fbb08f9eb@de.ibm.com>
Date: Wed, 8 Sep 2021 15:16:50 +0200
From: Christian Borntraeger <borntraeger@...ibm.com>
To: Pierre Morel <pmorel@...ux.ibm.com>,
David Hildenbrand <david@...hat.com>, kvm@...r.kernel.org
Cc: linux-s390@...r.kernel.org, linux-kernel@...r.kernel.org,
frankja@...ux.ibm.com, cohuck@...hat.com, thuth@...hat.com,
imbrenda@...ux.ibm.com, hca@...ux.ibm.com, gor@...ux.ibm.com
Subject: Re: [PATCH v3 2/3] s390x: KVM: Implementation of Multiprocessor
Topology-Change-Report
On 08.09.21 15:09, Pierre Morel wrote:
>
>
> On 9/8/21 9:07 AM, Christian Borntraeger wrote:
>>
>>
>> On 07.09.21 14:28, Pierre Morel wrote:
>>>
>>>
>>> On 9/6/21 8:37 PM, David Hildenbrand wrote:
>>>> On 03.08.21 10:26, Pierre Morel wrote:
>>>>> We let the userland hypervisor know if the machine support the CPU
>>>>> topology facility using a new KVM capability: KVM_CAP_S390_CPU_TOPOLOGY.
>>>>>
>>>>> The PTF instruction will report a topology change if there is any change
>>>>> with a previous STSI_15_2 SYSIB.
>>>>> Changes inside a STSI_15_2 SYSIB occur if CPU bits are set or clear
>>>>> inside the CPU Topology List Entry CPU mask field, which happens with
>>>>> changes in CPU polarization, dedication, CPU types and adding or
>>>>> removing CPUs in a socket.
>>>>>
>>>>> The reporting to the guest is done using the Multiprocessor
>>>>> Topology-Change-Report (MTCR) bit of the utility entry of the guest's
>>>>> SCA which will be cleared during the interpretation of PTF.
>>>>>
>>>>> To check if the topology has been modified we use a new field of the
>>>>> arch vCPU to save the previous real CPU ID at the end of a schedule
>>>>> and verify on next schedule that the CPU used is in the same socket.
>>>>>
>>>>> We deliberatly ignore:
>>>>> - polarization: only horizontal polarization is currently used in linux.
>>>>> - CPU Type: only IFL Type are supported in Linux
>>>>> - Dedication: we consider that only a complete dedicated CPU stack can
>>>>> take benefit of the CPU Topology.
>>>>>
>>>>> Signed-off-by: Pierre Morel <pmorel@...ux.ibm.com>
>>>>
>>>>
>>>>> @@ -228,7 +232,7 @@ struct kvm_s390_sie_block {
>>>>> __u8 icptcode; /* 0x0050 */
>>>>> __u8 icptstatus; /* 0x0051 */
>>>>> __u16 ihcpu; /* 0x0052 */
>>>>> - __u8 reserved54; /* 0x0054 */
>>>>> + __u8 mtcr; /* 0x0054 */
>>>>> #define IICTL_CODE_NONE 0x00
>>>>> #define IICTL_CODE_MCHK 0x01
>>>>> #define IICTL_CODE_EXT 0x02
>>>>> @@ -246,6 +250,7 @@ struct kvm_s390_sie_block {
>>>>> #define ECB_TE 0x10
>>>>> #define ECB_SRSI 0x04
>>>>> #define ECB_HOSTPROTINT 0x02
>>>>> +#define ECB_PTF 0x01
>>>>
>>>> From below I understand, that ECB_PTF can be used with stfl(11) in the hypervisor.
>>>>
>>>> What is to happen if the hypervisor doesn't support stfl(11) and we consequently cannot use ECB_PTF? Will QEMU be able to emulate PTF fully?
>>>>
>>>>
>>>>> __u8 ecb; /* 0x0061 */
>>>>> #define ECB2_CMMA 0x80
>>>>> #define ECB2_IEP 0x20
>>>>> @@ -747,6 +752,7 @@ struct kvm_vcpu_arch {
>>>>> bool skey_enabled;
>>>>> struct kvm_s390_pv_vcpu pv;
>>>>> union diag318_info diag318_info;
>>>>> + int prev_cpu;
>>>>> };
>>>>> struct kvm_vm_stat {
>>>>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>>>>> index b655a7d82bf0..ff6d8a2b511c 100644
>>>>> --- a/arch/s390/kvm/kvm-s390.c
>>>>> +++ b/arch/s390/kvm/kvm-s390.c
>>>>> @@ -568,6 +568,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>>>> case KVM_CAP_S390_VCPU_RESETS:
>>>>> case KVM_CAP_SET_GUEST_DEBUG:
>>>>> case KVM_CAP_S390_DIAG318:
>>>>> + case KVM_CAP_S390_CPU_TOPOLOGY:
>>>>
>>>> I would have expected instead
>>>>
>>>> r = test_facility(11);
>>>> break
>>>>
>>>> ...
>>>>
>>>>> r = 1;
>>>>> break;
>>>>> case KVM_CAP_SET_GUEST_DEBUG2:
>>>>> @@ -819,6 +820,23 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
>>>>> icpt_operexc_on_all_vcpus(kvm);
>>>>> r = 0;
>>>>> break;
>>>>> + case KVM_CAP_S390_CPU_TOPOLOGY:
>>>>> + mutex_lock(&kvm->lock);
>>>>> + if (kvm->created_vcpus) {
>>>>> + r = -EBUSY;
>>>>> + } else {
>>>>
>>>> ...
>>>> } else if (test_facility(11)) {
>>>> set_kvm_facility(kvm->arch.model.fac_mask, 11);
>>>> set_kvm_facility(kvm->arch.model.fac_list, 11);
>>>> r = 0;
>>>> } else {
>>>> r = -EINVAL;
>>>> }
>>>>
>>>> similar to how we handle KVM_CAP_S390_VECTOR_REGISTERS.
>>>>
>>>> But I assume you want to be able to support hosts without ECB_PTF, correct?
>>>>
>>>>
>>>>> + set_kvm_facility(kvm->arch.model.fac_mask, 11);
>>>>> + set_kvm_facility(kvm->arch.model.fac_list, 11);
>>>>> + r = 0;
>>>>> + }
>>>>> + mutex_unlock(&kvm->lock);
>>>>> + VM_EVENT(kvm, 3, "ENABLE: CPU TOPOLOGY %s",
>>>>> + r ? "(not available)" : "(success)");
>>>>> + break;
>>>>> +
>>>>> + r = -EINVAL;
>>>>> + break;
>>>>
>>>> ^ dead code
>>>>
>>>> [...]
>>>>
>>>>> }
>>>>> void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>>>> {
>>>>> + vcpu->arch.prev_cpu = vcpu->cpu;
>>>>> vcpu->cpu = -1;
>>>>> if (vcpu->arch.cputm_enabled && !is_vcpu_idle(vcpu))
>>>>> __stop_cpu_timer_accounting(vcpu);
>>>>> @@ -3198,6 +3239,11 @@ static int kvm_s390_vcpu_setup(struct kvm_vcpu *vcpu)
>>>>> vcpu->arch.sie_block->ecb |= ECB_HOSTPROTINT;
>>>>> if (test_kvm_facility(vcpu->kvm, 9))
>>>>> vcpu->arch.sie_block->ecb |= ECB_SRSI;
>>>>> +
>>>>> + /* PTF needs both host and guest facilities to enable interpretation */
>>>>> + if (test_kvm_facility(vcpu->kvm, 11) && test_facility(11))
>>>>> + vcpu->arch.sie_block->ecb |= ECB_PTF;
>>>>
>>>> Here you say we need both ...
>>>>
>>>>> +
>>>>> if (test_kvm_facility(vcpu->kvm, 73))
>>>>> vcpu->arch.sie_block->ecb |= ECB_TE;
>>>>> diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
>>>>> index 4002a24bc43a..50d67190bf65 100644
>>>>> --- a/arch/s390/kvm/vsie.c
>>>>> +++ b/arch/s390/kvm/vsie.c
>>>>> @@ -503,6 +503,9 @@ static int shadow_scb(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
>>>>> /* Host-protection-interruption introduced with ESOP */
>>>>> if (test_kvm_cpu_feat(vcpu->kvm, KVM_S390_VM_CPU_FEAT_ESOP))
>>>>> scb_s->ecb |= scb_o->ecb & ECB_HOSTPROTINT;
>>>>> + /* CPU Topology */
>>>>> + if (test_kvm_facility(vcpu->kvm, 11))
>>>>> + scb_s->ecb |= scb_o->ecb & ECB_PTF;
>>>>
>>>> but here you don't check?
>>>>
>>>>> /* transactional execution */
>>>>> if (test_kvm_facility(vcpu->kvm, 73) && wants_tx) {
>>>>> /* remap the prefix is tx is toggled on */
>>>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>>>> index d9e4aabcb31a..081ce0cd44b9 100644
>>>>> --- a/include/uapi/linux/kvm.h
>>>>> +++ b/include/uapi/linux/kvm.h
>>>>> @@ -1112,6 +1112,7 @@ struct kvm_ppc_resize_hpt {
>>>>> #define KVM_CAP_BINARY_STATS_FD 203
>>>>> #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
>>>>> #define KVM_CAP_ARM_MTE 205
>>>>> +#define KVM_CAP_S390_CPU_TOPOLOGY 206
>>>>
>>>> We'll need a Documentation/virt/kvm/api.rst description.
>>>>
>>>> I'm not completely confident that the way we're handling the capability+facility is the right approach. It all feels a bit suboptimal.
>>>>
>>>> Except stfl(74) -- STHYI --, we never enable a facility via set_kvm_facility() that's not available in the host. And STHYI is special such that it is never implemented in hardware.
>>>>
>>>> I'll think about what might be cleaner once I get some more details about the interaction with stfl(11) in the hypervisor.
>>>>
>>>
>>> OK, may be we do not need to handle the case stfl(11) is not present in the host, these are pre GA10...
>>
>> What about VSIE? For all existing KVM guests, stfl11 is off.
>
> In VSIE the patch activates stfl(11) only if the host has stfl(11).
>
> I do not see any problem to activate the interpretation in VSIE with ECB_PTF (ECB.7) when the host has stfl(11) and QEMU asks to enable it for the guest using the CAPABILITY as it is done in this patch.
>
> if any intermediary hypervizor decide to not advertize stfl(11) for the guest like an old QEMU not having the CAPABILITY, or a QEMU with ctop=off, KVM will not set ECB_PTF and the PTF instruction will trigger a program check as before.
>
> Is it OK or did I missed something?
Yes, sure.
My point was regarding the pre z10 statement. We will see hosts without stfl(e)11 when running nested on z14, z15 and co.
Powered by blists - more mailing lists