lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 14 Dec 2021 11:12:32 +0200
From:   Maxim Levitsky <mlevitsk@...hat.com>
To:     Sean Christopherson <seanjc@...gle.com>,
        Paolo Bonzini <pbonzini@...hat.com>
Cc:     Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        syzbot+f1d2136db9c80d4733e8@...kaller.appspotmail.com
Subject: Re: [PATCH 2/4] KVM: nVMX: Synthesize TRIPLE_FAULT for L2 if
 emulation is required

On Tue, 2021-12-07 at 19:30 +0000, Sean Christopherson wrote:
> Synthesize a triple fault if L2 guest state is invalid at the time of
> VM-Enter, which can happen if L1 modifies SMRAM or if userspace stuffs
> guest state via ioctls(), e.g. KVM_SET_SREGS.  KVM should never emulate
> invalid guest state, since from L1's perspective, it's architecturally
> impossible for L2 to have invalid state while L2 is running in hardware.
> E.g. attempts to set CR0 or CR4 to unsupported values will either VM-Exit
> or #GP.
> 
> Modifying vCPU state via RSM+SMRAM and ioctl() are the only paths that
> can trigger this scenario, as nested VM-Enter correctly rejects any
> attempt to enter L2 with invalid state.
> 
> RSM is a straightforward case as (a) KVM follows AMD's SMRAM layout and
> behavior, and (b) Intel's SDM states that loading reserved CR0/CR4 bits
> via RSM results in shutdown, i.e. there is precedent for KVM's behavior.
> Following AMD's SMRAM layout is important as AMD's layout saves/restores
> the descriptor cache information, including CS.RPL and SS.RPL, and also
> defines all the fields relevant to invalid guest state as read-only, i.e.
> so long as the vCPU had valid state before the SMI, which is guaranteed
> for L2, RSM will generate valid state unless SMRAM was modified.  Intel's
> layout saves/restores only the selector, which means that scenarios where
> the selector and cached RPL don't match, e.g. conforming code segments,
> would yield invalid guest state.  Intel CPUs fudge around this issued by
> stuffing SS.RPL and CS.RPL on RSM.  Per Intel's SDM on the "Default
> Treatment of RSM", paraphrasing for brevity:
> 
>   IF internal storage indicates that the [CPU was post-VMXON]
>   THEN
>      enter VMX operation (root or non-root);
>      restore VMX-critical state as defined in Section 34.14.1;
>      set to their fixed values any bits in CR0 and CR4 whose values must
>      be fixed in VMX operation [unless coming from an unrestricted guest];
>      IF RFLAGS.VM = 0 AND (in VMX root operation OR the
>         “unrestricted guest” VM-execution control is 0)
>      THEN
>        CS.RPL := SS.DPL;
>        SS.RPL := SS.DPL;
>      FI;
>      restore current VMCS pointer;
>   FI;
> 
> Note that Intel CPUs also overwrite the fixed CR0/CR4 bits, whereas KVM
> will sythesize TRIPLE_FAULT in this scenario.  KVM's behavior is allowed
> as both Intel and AMD define CR0/CR4 SMRAM fields as read-only, i.e. the
> only way for CR0 and/or CR4 to have illegal values is if they were
> modified by the L1 SMM handler, and Intel's SDM "SMRAM State Save Map"
> section states "modifying these registers will result in unpredictable
> behavior".
> 
> KVM's ioctl() behavior is less straightforward.  Because KVM allows
> ioctls() to be executed in any order, rejecting an ioctl() if it would
> result in invalid L2 guest state is not an option as KVM cannot know if
> a future ioctl() would resolve the invalid state, e.g. KVM_SET_SREGS, or
> drop the vCPU out of L2, e.g. KVM_SET_NESTED_STATE.  Ideally, KVM would
> reject KVM_RUN if L2 contained invalid guest state, but that carries the
> risk of a false positive, e.g. if RSM loaded invalid guest state and KVM
> exited to userspace.  Setting a flag/request to detect such a scenario is
> undesirable because (a) it's extremely unlikely to add value to KVM as a
> whole, and (b) KVM would need to consider ioctl() interactions with such
> a flag, e.g. if userspace migrated the vCPU while the flag were set.
> 
> Cc: stable@...r.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
>  arch/x86/kvm/vmx/vmx.c | 32 ++++++++++++++++++++++++--------
>  1 file changed, 24 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 9e415e5a91ab..5307fcee3b3b 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -5900,18 +5900,14 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
>  		vmx_flush_pml_buffer(vcpu);
>  
>  	/*
> -	 * We should never reach this point with a pending nested VM-Enter, and
> -	 * more specifically emulation of L2 due to invalid guest state (see
> -	 * below) should never happen as that means we incorrectly allowed a
> -	 * nested VM-Enter with an invalid vmcs12.
> +	 * KVM should never reach this point with a pending nested VM-Enter.
> +	 * More specifically, short-circuiting VM-Entry to emulate L2 due to
> +	 * invalid guest state should never happen as that means KVM knowingly
> +	 * allowed a nested VM-Enter with an invalid vmcs12.  More below.
>  	 */
>  	if (KVM_BUG_ON(vmx->nested.nested_run_pending, vcpu->kvm))
>  		return -EIO;
>  
> -	/* If guest state is invalid, start emulating */
> -	if (vmx->emulation_required)
> -		return handle_invalid_guest_state(vcpu);
> -
>  	if (is_guest_mode(vcpu)) {
>  		/*
>  		 * PML is never enabled when running L2, bail immediately if a
> @@ -5933,10 +5929,30 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
>  		 */
>  		nested_mark_vmcs12_pages_dirty(vcpu);
>  
> +		/*
> +		 * Synthesize a triple fault if L2 state is invalid.  In normal
> +		 * operation, nested VM-Enter rejects any attempt to enter L2
> +		 * with invalid state.  However, those checks are skipped if
> +		 * state is being stuffed via RSM or KVM_SET_NESTED_STATE.  If
> +		 * L2 state is invalid, it means either L1 modified SMRAM state
> +		 * or userspace provided bad state.  Synthesize TRIPLE_FAULT as
> +		 * doing so is architecturally allowed in the RSM case, and is
> +		 * the least awful solution for the userspace case without
> +		 * risking false positives.
> +		 */
> +		if (vmx->emulation_required) {
> +			nested_vmx_vmexit(vcpu, EXIT_REASON_TRIPLE_FAULT, 0, 0);
> +			return 1;
> +		}
> +
>  		if (nested_vmx_reflect_vmexit(vcpu))
>  			return 1;
>  	}
>  
> +	/* If guest state is invalid, start emulating.  L2 is handled above. */
> +	if (vmx->emulation_required)
> +		return handle_invalid_guest_state(vcpu);
> +
>  	if (exit_reason.failed_vmentry) {
>  		dump_vmcs(vcpu);
>  		vcpu->run->exit_reason = KVM_EXIT_FAIL_ENTRY;



Reviewed-by: Maxim Levitsky <mlevitsk@...hat.com>
Best regards,
	Maxim Levitsky

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ