linux-kernel - Re: [PATCH v5 1/1] s390x: KVM: accept STSI for CPU topology information

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <acf26e48-733b-06ab-e172-0f058c3d8624@linux.ibm.com>
Date:   Mon, 13 Dec 2021 11:16:51 +0100
From:   Pierre Morel <pmorel@...ux.ibm.com>
To:     Janosch Frank <frankja@...ux.ibm.com>, kvm@...r.kernel.org
Cc:     linux-s390@...r.kernel.org, linux-kernel@...r.kernel.org,
        borntraeger@...ibm.com, cohuck@...hat.com, david@...hat.com,
        thuth@...hat.com, imbrenda@...ux.ibm.com, hca@...ux.ibm.com,
        gor@...ux.ibm.com
Subject: Re: [PATCH v5 1/1] s390x: KVM: accept STSI for CPU topology
 information



On 12/9/21 17:08, Janosch Frank wrote:
> On 11/22/21 14:14, Pierre Morel wrote:
>> We let the userland hypervisor know if the machine support the CPU
>> topology facility using a new KVM capability: KVM_CAP_S390_CPU_TOPOLOGY.
>>
>> The PTF instruction will report a topology change if there is any change
>> with a previous STSI_15_1_2 SYSIB.
>> Changes inside a STSI_15_1_2 SYSIB occur if CPU bits are set or clear
>> inside the CPU Topology List Entry CPU mask field, which happens with
>> changes in CPU polarization, dedication, CPU types and adding or
>> removing CPUs in a socket.
>>
>> The reporting to the guest is done using the Multiprocessor
>> Topology-Change-Report (MTCR) bit of the utility entry of the guest's
>> SCA which will be cleared during the interpretation of PTF.
>>
>> To check if the topology has been modified we use a new field of the
>> arch vCPU to save the previous real CPU ID at the end of a schedule
>> and verify on next schedule that the CPU used is in the same socket.
>>
>> We assume in this patch:
>> - no polarization change: only horizontal polarization is currently
>>    used in linux.
>> - no CPU Type change: only IFL Type are supported in Linux
>> - Dedication: with this patch, only a complete dedicated CPU stack can
>>    take benefit of the CPU Topology.
>>
>> STSI(15.1.x) gives information on the CPU configuration topology.
>> Let's accept the interception of STSI with the function code 15 and
>> let the userland part of the hypervisor handle it when userland
>> support the CPU Topology facility.
>>
>> Signed-off-by: Pierre Morel <pmorel@...ux.ibm.com>
>> ---
>>   Documentation/virt/kvm/api.rst   | 16 ++++++++++
>>   arch/s390/include/asm/kvm_host.h | 14 ++++++---
>>   arch/s390/kvm/kvm-s390.c         | 52 +++++++++++++++++++++++++++++++-
>>   arch/s390/kvm/priv.c             |  7 ++++-
>>   arch/s390/kvm/vsie.c             |  3 ++
>>   include/uapi/linux/kvm.h         |  1 +
>>   6 files changed, 87 insertions(+), 6 deletions(-)
>>
>> diff --git a/Documentation/virt/kvm/api.rst 
>> b/Documentation/virt/kvm/api.rst
>> index aeeb071c7688..e5c9da0782a6 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -7484,3 +7484,19 @@ The argument to KVM_ENABLE_CAP is also a 
>> bitmask, and must be a subset
>>   of the result of KVM_CHECK_EXTENSION.  KVM will forward to userspace
>>   the hypercalls whose corresponding bit is in the argument, and return
>>   ENOSYS for the others.
>> +
>> +8.17 KVM_CAP_S390_CPU_TOPOLOGY
>> +------------------------------
>> +
>> +:Capability: KVM_CAP_S390_CPU_TOPOLOGY
>> +:Architectures: s390
>> +:Type: vm
>> +
>> +This capability indicates that kvm will provide the S390 CPU Topology 
>> facility
>> +which consist of the interpretation of the PTF instruction for the 
>> Function
>> +Code 2 along with interception and forwarding of both the PTF 
>> instruction
>> +with function Codes 0 or 1 and the STSI(15,1,x) instruction to the 
>> userland
> 
> The capitalization of "Function code" is inconsistent.

ok

> 
>> +hypervisor.
>> +
>> +The stfle facility 11, CPU Topology facility, should not be provided 
>> to the
>> +guest without this capability.
>> diff --git a/arch/s390/include/asm/kvm_host.h 
>> b/arch/s390/include/asm/kvm_host.h
>> index a604d51acfc8..cccc09a8fdab 100644
>> --- a/arch/s390/include/asm/kvm_host.h
>> +++ b/arch/s390/include/asm/kvm_host.h
>> @@ -95,15 +95,19 @@ struct bsca_block {
>>       union ipte_control ipte_control;
>>       __u64    reserved[5];
>>       __u64    mcn;
>> -    __u64    reserved2;
>> +#define ESCA_UTILITY_MTCR    0x8000
>> +    __u16    utility;
>> +    __u8    reserved2[6];
>>       struct bsca_entry cpu[KVM_S390_BSCA_CPU_SLOTS];
>>   };
>>   struct esca_block {
>>       union ipte_control ipte_control;
>> -    __u64   reserved1[7];
>> +    __u64   reserved1[6];
>> +    __u16    utility;
>> +    __u8    reserved2[6];
>>       __u64   mcn[4];
>> -    __u64   reserved2[20];
>> +    __u64   reserved3[20];
> 
> Note to self: Prime example for a move to reserved member names based on 
> offsets.

yes

> 
>>       struct esca_entry cpu[KVM_S390_ESCA_CPU_SLOTS];
>>   };
>> @@ -228,7 +232,7 @@ struct kvm_s390_sie_block {
>>       __u8    icptcode;        /* 0x0050 */
>>       __u8    icptstatus;        /* 0x0051 */
>>       __u16    ihcpu;            /* 0x0052 */
>> -    __u8    reserved54;        /* 0x0054 */
>> +    __u8    mtcr;            /* 0x0054 */
>>   #define IICTL_CODE_NONE         0x00
>>   #define IICTL_CODE_MCHK         0x01
>>   #define IICTL_CODE_EXT         0x02
>> @@ -247,6 +251,7 @@ struct kvm_s390_sie_block {
>>   #define ECB_SPECI    0x08
>>   #define ECB_SRSI    0x04
>>   #define ECB_HOSTPROTINT    0x02
>> +#define ECB_PTF        0x01
>>       __u8    ecb;            /* 0x0061 */
>>   #define ECB2_CMMA    0x80
>>   #define ECB2_IEP    0x20
>> @@ -748,6 +753,7 @@ struct kvm_vcpu_arch {
>>       bool skey_enabled;
>>       struct kvm_s390_pv_vcpu pv;
>>       union diag318_info diag318_info;
>> +    int prev_cpu;
>>   };
>>   struct kvm_vm_stat {
> 
> [..]
> 
>>   }
>> -void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>> +static void kvm_s390_set_mtcr(struct kvm_vcpu *vcpu)
> 
> We change a vcpu related data structure, there should be "vcpu" in the 
> function name to indicate that.

ok

> 
>>   {
>> +    struct esca_block *esca = vcpu->kvm->arch.sca;
>> +    if (vcpu->arch.sie_block->ecb & ECB_PTF) {
> 
> I'm wondering if we should replace these checks with the 
> test_kvm_facility() ones. ECB_PTF is never changed after vcpu setup, right?

sure, it is left from the first draw as the patch supported both 
interpretation and interception.

> 
>> +        ipte_lock(vcpu);
>> +        WRITE_ONCE(esca->utility, ESCA_UTILITY_MTCR);
>> +        ipte_unlock(vcpu);
>> +    }
>> +}
>> +
>> +void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>> +{
>>       gmap_enable(vcpu->arch.enabled_gmap);
>>       kvm_s390_set_cpuflags(vcpu, CPUSTAT_RUNNING);
>>       if (vcpu->arch.cputm_enabled && !is_vcpu_idle(vcpu))
>>           __start_cpu_timer_accounting(vcpu);
>>       vcpu->cpu = cpu;
>> +
>> +    /*
>> +     * With PTF interpretation the guest will be aware of topology
>> +     * change when the Multiprocessor Topology-Change-Report is pending.
>> +     * We check for events modifying the result of STSI_15_2:
>> +     * - A new vCPU has been hotplugged (prev_cpu == -1)
>> +     * - The real CPU backing up the vCPU moved to another socket
>> +     */
>> +    if (vcpu->arch.sie_block->ecb & ECB_PTF) {
>> +        if (vcpu->arch.prev_cpu == -1 ||
>> +            (topology_physical_package_id(cpu) !=
>> +             topology_physical_package_id(vcpu->arch.prev_cpu)))
> 
> This is barely readable, might be good to put this check in a separate 
> function in kvm-s390.h.

ok

> 
>> +            kvm_s390_set_mtcr(vcpu);
>> +    }
>>   }
>>   void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>   {
>> +    /* Remember which CPU was backing the vCPU */
>> +    vcpu->arch.prev_cpu = vcpu->cpu;
>>       vcpu->cpu = -1;
>>       if (vcpu->arch.cputm_enabled && !is_vcpu_idle(vcpu))
>>           __stop_cpu_timer_accounting(vcpu);
>> @@ -3220,6 +3263,13 @@ static int kvm_s390_vcpu_setup(struct kvm_vcpu 
>> *vcpu)
>>           vcpu->arch.sie_block->ecb |= ECB_HOSTPROTINT;
>>       if (test_kvm_facility(vcpu->kvm, 9))
>>           vcpu->arch.sie_block->ecb |= ECB_SRSI;
>> +
>> +    /* PTF needs guest facilities to enable interpretation */
> 
> Please explain.
> How is this different from any other facility a few lines above in this 
> function?

it is not I remove the comment, here again left from the time the patch 
supported interception.

> 
>> +    if (test_kvm_facility(vcpu->kvm, 11))
>> +        vcpu->arch.sie_block->ecb |= ECB_PTF;
>> +    /* Set the prev_cpu value to an impossible value to detect a new 
>> vcpu */
> 
> We can either change this to:
> "A prev_value of -1 indicates this is a new vcpu"
> 
> Or we define a constant which will also make the check in 
> kvm_arch_vcpu_load() easier to understand.

ok, the constant would be clearer.

> 
>> +    vcpu->arch.prev_cpu = -1;
>> +
>>       if (test_kvm_facility(vcpu->kvm, 73))
>>           vcpu->arch.sie_block->ecb |= ECB_TE;
>>       if (!kvm_is_ucontrol(vcpu->kvm))
>> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
>> index 417154b314a6..26d165733496 100644
>> --- a/arch/s390/kvm/priv.c
>> +++ b/arch/s390/kvm/priv.c
>> @@ -861,7 +861,8 @@ static int handle_stsi(struct kvm_vcpu *vcpu)
>>       if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
>>           return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
>> -    if (fc > 3) {
>> +    if ((fc > 3 && fc != 15) ||
>> +        (fc == 15 && !test_kvm_facility(vcpu->kvm, 11))) {
>>           kvm_s390_set_psw_cc(vcpu, 3);
>>           return 0;
>>       }
> 
> How about:
> 
> if (fc > 3 && fc != 15)
>      goto out_no_data;
> 
> /* fc 15 is provided with PTF/CPU topology support */
> if (fc == 15 && !test_kvm_facility(vcpu->kvm, 11))
>      goto out_no_data;

ok, clearer


Thanks for review,
Pierre

-- 
Pierre Morel
IBM Lab Boeblingen