[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANRm+Cwh__btdC4e4t+jYqHsafL6xff6t4eukxT=EmwVLYvrMA@mail.gmail.com>
Date: Fri, 22 Apr 2016 09:40:43 +0800
From: Wanpeng Li <kernellwp@...il.com>
To: Radim Krčmář <rkrcmar@...hat.com>
Cc: Greg Kurz <gkurz@...ux.vnet.ibm.com>,
Paolo Bonzini <pbonzini@...hat.com>, james.hogan@...tec.com,
Ingo Molnar <mingo@...hat.com>, linux-mips@...ux-mips.org,
kvm <kvm@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
qemu-ppc@...gnu.org, Cornelia Huck <cornelia.huck@...ibm.com>,
Paul Mackerras <paulus@...ba.org>,
David Gibson <david@...son.dropbear.id.au>
Subject: Re: [PATCH v3] KVM: remove buggy vcpu id check on vcpu creation
2016-04-21 23:29 GMT+08:00 Radim Krčmář <rkrcmar@...hat.com>:
> 2016-04-21 13:29+0200, Greg Kurz:
>> On Wed, 20 Apr 2016 20:29:09 +0200
>> Radim Krčmář <rkrcmar@...hat.com> wrote:
>>> 2016-04-20 17:44+0200, Greg Kurz:
>>> > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
>>> > introduced a check to prevent potential kernel memory corruption in case
>>> > the vcpu id is too great.
>>> >
>>> > Unfortunately this check assumes vcpu ids grow in sequence with a common
>>> > difference of 1, which is wrong: archs are free to use vcpu id as they fit.
>>> > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
>>> > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
>>> > 1024, guests may be limited down to 128 vcpus on POWER8.
>>> >
>>> > This means the check does not belong here and should be moved to some arch
>>> > specific function: kvm_arch_vcpu_create() looks like a good candidate.
>>> >
>>> > ARM and s390 already have such a check.
>>> >
>>> > I could not spot any path in the PowerPC or common KVM code where a vcpu
>>> > id is used as described in the above commit: I believe PowerPC can live
>>> > without this check.
>>>
>>> The only problematic path I see is kvm_get_vcpu_by_id(), which returns
>>> NULL for any id above KVM_MAX_VCPUS.
>>
>> Oops my bad, I started to work on a 4.4 tree and I missed this check brought
>> by commit c896939f7cff (KVM: use heuristic for fast VCPU lookup by id).
>>
>> But again, I believe the check is wrong there also: the changelog just mentions
>> this is a fastpath for the usual case where "VCPU ids match the array index"...
>> why does the patch add a NULL return path if id >= KVM_MAX_VCPUS ?
>
> (The patch had to check id >= KVM_MAX_VCPUS for sanity and there could
> not be a VCPU with that index according to the spec, so it made a
> shortcut to the correct NULL result ...)
>
>>> Second issue is that Documentation/virtual/kvm/api.txt says
>>> 4.7 KVM_CREATE_VCPU
>>> [...]
>>> This API adds a vcpu to a virtual machine. The vcpu id is a small
>>> integer in the range [0, max_vcpus).
>>>
>>
>> Yeah and I find the meaning of max_vcpus is unclear.
>>
>> Here it is considered as a limit for vcpu id, but if you look at the code,
>> KVM_MAX_VCPUS is also used as a limit for the number of vcpus:
>>
>> virt/kvm/kvm_main.c: if (atomic_read(&kvm->online_vcpus) == KVM_MAX_VCPUS) {
>
> I agree. Naming of KVM_CAP_NR_VCPUS and KVM_CAP_MAX_VCPUS would make
> you think that online_vcpus limit interpretation is the correct one, but
> the code is conflicted.
>
>>> so we'd remove those two lines and change the API too. The change would
>>> be somewhat backward compatible, but doesn't PowerPC use high vcpu_id
>>> just because KVM is lacking an API to set DT ID?
>>
>> This is related to a limitation when running in book3s_hv mode with cpus
>> that support SMT (multiple HW threads): all HW threads within a core
>> cannot be running in different guests at the same time.
>>
>> We solve this by using a vcpu numbering scheme as follows:
>>
>> vcpu_id[N] = (N * thread_per_core_guest) / threads_per_core_host + N % threads_per_core_guest
>>
>> where N means "the Nth vcpu presented to the guest". This allows to have groups of vcpus
>> that can be scheduled to run on the same real core.
>>
>> So, in the "worst" case where we want to run a guest with 1 thread/core and the host
>> has 8 threads/core, we will need the vcpu_id limit to be 8*KVM_MAX_VCPUS.
>
> I see, thanks. Accommodating existing users seems like an acceptable
> excuse to change the API.
>
>>> x86 (APIC ID) is affected by this and ARM (MP ID) probably too.
>>>
>>
>> x86 is limited to KVM_MAX_VCPUS (== 255) vcpus: it won't be affected if we also
>> patch kvm_get_vcpu_by_id() like suggested above.
>
> x86 vcpu_id encodes APIC ID and APIC ID encodes CPU topology by
> reserving blocks of bits for socket/core/thread, so if core or thread
> count isn't a power of two, then the set of valid APIC IDs is sparse,
^^^^^^^^^^^^^^^^^^^
^^^^^^^
Is this the root reason why recommand max vCPUs per vm is 160 and the
KVM_MAX_VCPUS is 255 instead of due to perforamnce concern?
Regards,
Wanpeng Li
> but max id is still limited by 255, so the effective maximum VCPU count
> is lower.
Powered by blists - more mailing lists