lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 5 Nov 2019 19:17:48 +0200
From:   Liran Alon <liran.alon@...cle.com>
To:     Vitaly Kuznetsov <vkuznets@...hat.com>
Cc:     kvm list <kvm@...r.kernel.org>, x86@...nel.org,
        Paolo Bonzini <pbonzini@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Sean Christopherson <sean.j.christopherson@...el.com>,
        Jim Mattson <jmattson@...gle.com>,
        linux-kernel@...r.kernel.org, "H. Peter Anvin" <hpa@...or.com>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>
Subject: Re: [PATCH RFC] KVM: x86: tell guests if the exposed SMT topology is
 trustworthy



> On 5 Nov 2019, at 18:17, Vitaly Kuznetsov <vkuznets@...hat.com> wrote:
> 
> Virtualized guests may pick a different strategy to mitigate hardware
> vulnerabilities when it comes to hyper-threading: disable SMT completely,
> use core scheduling, or, for example, opt in for STIBP. Making the
> decision, however, requires an extra bit of information which is currently
> missing: does the topology the guest see match hardware or if it is 'fake'
> and two vCPUs which look like different cores from guest's perspective can
> actually be scheduled on the same physical core. Disabling SMT or doing
> core scheduling only makes sense when the topology is trustworthy.

This is not only related to vulnerability mitigations.
It’s also important for guest to know if it’s SMT topology is trustworthy for various optimisation algorithms.
E.g. Should it attempt to run tasks that share memory on same NUMA node?

> 
> Add two feature bits to KVM: KVM_FEATURE_TRUSTWORTHY_SMT with the meaning
> that KVM_HINTS_TRUSTWORTHY_SMT bit answers the question if the exposed SMT
> topology is actually trustworthy. It would, of course, be possible to get
> away with a single bit (e.g. 'KVM_FEATURE_FAKE_SMT') and not lose backwards
> compatibility but the current approach looks more straightforward.

Agree.

> 
> There were some offline discussions on whether this new feature bit should
> be complemented with a 're-enlightenment' mechanism for live migration (so
> it can change in guest's lifetime) but it doesn't seem to be very
> practical: what a sane guest is supposed to do if it's told that SMT
> topology is about to become fake other than kill itself? Also, it seems to
> make little sense to do e.g. CPU pinning on the source but not on the
> destination.

Agree.

> 
> There is also one additional piece of the information missing. A VM can be
> sharing physical cores with other VMs (or other userspace tasks on the
> host) so does KVM_FEATURE_TRUSTWORTHY_SMT imply that it's not the case or
> not? It is unclear if this changes anything and can probably be left out
> of scope (just don't do that).

I don’t think KVM_FEATURE_TRUSTWORTHY_SMT should indicate to guest whether it’s vCPU shares a CPU core with another guest.
It should only expose to guest the fact that he can rely on it’s virtual SMT topology. i.e. That there is a relation between virtual SMT topology
to which physical logical processors run which vCPUs.

Guest have nothing to do with the fact that he is now aware host doesn’t guarantee to him that one of it’s vCPU shares a CPU core with another guest vCPU.
I don’t think we should have a CPUID bit that expose this information to guest.

> 
> Similar to the already existent 'NoNonArchitecturalCoreSharing' Hyper-V
> enlightenment, the default value of KVM_HINTS_TRUSTWORTHY_SMT is set to
> !cpu_smt_possible(). KVM userspace is thus supposed to pass it to guest's
> CPUIDs in case it is '1' (meaning no SMT on the host at all) or do some
> extra work (like CPU pinning and exposing the correct topology) before
> passing '1' to the guest.

Hmm… I’m not sure this is correct.
For example, it is possible to expose in virtual SMT topology that guest have 2 vCPUs running on single NUMA node,
while in reality each vCPU task can be scheduled to run on different NUMA nodes. Therefore, making virtual SMT topology not trustworthy.
i.e. Disabling SMT on host doesn’t mean that virtual SMT topology is reliable.

I think this CPUID bit should just be set from userspace when admin have guaranteed to guest that it have set vCPU task affinity properly.
Without KVM attempting to set this bit by itself.

Note that we defined above KVM_HINTS_TRUSTWORTHY_SMT bit differently than “NoNonArchitecturalCoreSharing”.
“NoNonArchitecturalCoreSharing” guarantees to guest that vCPUs of guest won’t share a physical CPU core unless they are defined as virtual SMT siblings.
In contrast, KVM_HINTS_TRUSTWORTHY_SMT bit attempts to state that virtual SMT topology is a subset of how vCPUs are scheduled on physical SMT topology.
i.e. It seems that Hyper-V bit is indeed only attempting to provide guest information related to security mitigations. While newly proposed KVM bit attempts to also
assist guest to determine how to perform it’s internal scheduling decisions.

-Liran

> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@...hat.com>
> ---
> Documentation/virt/kvm/cpuid.rst     | 27 +++++++++++++++++++--------
> arch/x86/include/uapi/asm/kvm_para.h |  2 ++
> arch/x86/kvm/cpuid.c                 |  7 ++++++-
> 3 files changed, 27 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
> index 01b081f6e7ea..64b94103fc90 100644
> --- a/Documentation/virt/kvm/cpuid.rst
> +++ b/Documentation/virt/kvm/cpuid.rst
> @@ -86,6 +86,10 @@ KVM_FEATURE_PV_SCHED_YIELD        13          guest checks this feature bit
>                                               before using paravirtualized
>                                               sched yield.
> 
> +KVM_FEATURE_TRUSTWORTHY_SMT       14          set when host supports 'SMT
> +                                              topology is trustworthy' hint
> +                                              (KVM_HINTS_TRUSTWORTHY_SMT).
> +
> KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24          host will warn if no guest-side
>                                               per-cpu warps are expeced in
>                                               kvmclock
> @@ -97,11 +101,18 @@ KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24          host will warn if no guest-side
> 
> Where ``flag`` here is defined as below:
> 
> -================== ============ =================================
> -flag               value        meaning
> -================== ============ =================================
> -KVM_HINTS_REALTIME 0            guest checks this feature bit to
> -                                determine that vCPUs are never
> -                                preempted for an unlimited time
> -                                allowing optimizations
> -================== ============ =================================
> +================================= =========== =================================
> +flag                              value       meaning
> +================================= =========== =================================
> +KVM_HINTS_REALTIME                0           guest checks this feature bit to
> +                                              determine that vCPUs are never
> +                                              preempted for an unlimited time
> +                                              allowing optimizations
> +
> +KVM_HINTS_TRUSTWORTHY_SMT         1           the bit is set when the exposed
> +                                              SMT topology is trustworthy, this
> +                                              means that two guest vCPUs will
> +                                              never share a physical core
> +                                              unless they are exposed as SMT
> +                                              threads.
> +================================= =========== =================================
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> index 2a8e0b6b9805..183239d5dfad 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -31,8 +31,10 @@
> #define KVM_FEATURE_PV_SEND_IPI	11
> #define KVM_FEATURE_POLL_CONTROL	12
> #define KVM_FEATURE_PV_SCHED_YIELD	13
> +#define KVM_FEATURE_TRUSTWORTHY_SMT	14
> 
> #define KVM_HINTS_REALTIME      0
> +#define KVM_HINTS_TRUSTWORTHY_SMT	1
> 
> /* The last 8 bits are used to indicate how to interpret the flags field
>  * in pvclock structure. If no bits are set, all flags are ignored.
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index f68c0c753c38..dab527a7081f 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -712,7 +712,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
> 			     (1 << KVM_FEATURE_ASYNC_PF_VMEXIT) |
> 			     (1 << KVM_FEATURE_PV_SEND_IPI) |
> 			     (1 << KVM_FEATURE_POLL_CONTROL) |
> -			     (1 << KVM_FEATURE_PV_SCHED_YIELD);
> +			     (1 << KVM_FEATURE_PV_SCHED_YIELD) |
> +			     (1 << KVM_FEATURE_TRUSTWORTHY_SMT);
> 
> 		if (sched_info_on())
> 			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
> @@ -720,6 +721,10 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
> 		entry->ebx = 0;
> 		entry->ecx = 0;
> 		entry->edx = 0;
> +
> +		if (!cpu_smt_possible())
> +			entry->edx |= (1 << KVM_HINTS_TRUSTWORTHY_SMT);
> +
> 		break;
> 	case 0x80000000:
> 		entry->eax = min(entry->eax, 0x8000001f);
> -- 
> 2.20.1
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ