lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c28620cdf5d1ce37b30e39355631fda0f3e6324a.camel@redhat.com>
Date:   Mon, 04 Apr 2022 12:50:43 +0300
From:   Maxim Levitsky <mlevitsk@...hat.com>
To:     Sean Christopherson <seanjc@...gle.com>,
        "Maciej S. Szmigiero" <mail@...iej.szmigiero.name>
Cc:     Paolo Bonzini <pbonzini@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>,
        Tom Lendacky <thomas.lendacky@....com>,
        Brijesh Singh <brijesh.singh@....com>,
        Jon Grimm <Jon.Grimm@....com>,
        David Kaplan <David.Kaplan@....com>,
        Boris Ostrovsky <boris.ostrovsky@...cle.com>,
        Liam Merwick <liam.merwick@...cle.com>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/5] KVM: nSVM: Sync next_rip field from vmcb12 to vmcb02

On Fri, 2022-04-01 at 21:51 +0000, Sean Christopherson wrote:
> On Fri, Apr 01, 2022, Maciej S. Szmigiero wrote:
> > On 1.04.2022 20:32, Sean Christopherson wrote:
> > > On Thu, Mar 10, 2022, Maciej S. Szmigiero wrote:
> > > > +	/* The return address pushed on stack by the CPU for some injected events */
> > > > +	svm->vmcb->control.next_rip            = svm->nested.ctl.next_rip;
> > > 
> > > This needs to be gated by nrips being enabled _and_ exposed to L1, i.e.
> > > 
> > > 	if (svm->nrips_enabled)
> > > 		vmcb02->control.next_rip    = svm->nested.ctl.next_rip;
> > 
> > It can be done, however what if we run on a nrips-capable CPU,
> > but don't expose this capability to the L1?
> 
> Oh, right, because the field will be populated by the CPU on VM-Exit.  Ah, the
> correct behavior is to grab RIP from vmcb12 to emulate nrips=0 hardware simply
> not updating RIP.  E.g. zeroing it out would send L2 into the weeds on IRET due
> the CPU pushing '0' on the stack when vectoring the injected event.
> 
> 	if (svm->nrips_enabled)
> 		vmcb02->control.next_rip    = svm->nested.ctl.next_rip;
> 	else if (boot_cpu_has(X86_FEATURE_NRIPS))
> 		vmcb02->control.next_rip    = vmcb12_rip;
> 
> > The CPU will then push whatever value was left in this field as
> > the return address for some L1 injected events.

This makes sense.

Note that even AMD's PRM has a note about this:

"
15.20 Event Injection
...
Software interrupts cannot be properly injected if the processor does not support the NextRIP field.
Support is indicated by CPUID Fn8000_000A_EDX[NRIPS] = 1. Hypervisor software should
emulate the event injection of software interrupts if NextRIP is not supported
"



> > 
> > Although without nrips feature the L1 shouldn't even attempt event
> > injection, copying this field anyway will make it work if L1 just
> > expects this capability based on the current CPU model rather than
> > by checking specific CPUID feature bits.

The guest really ought to check CPUID bits. Plus the CPU model is also
usually virtualized (for named machine types in Qemu for example).

> 
> L1 may still inject the exception, it just advances the RIP manually.  As above,
> the really messy thing is that, because there's no flag to say "don't use NextRIP!",
> the CPU will still consume NextRIP and push '0' on the stack for the return RIP
> from the INTn/INT3/INTO.  Yay.
> 
> I found that out the hard way (patch in-progress).  The way to handle event
> injection if KVM is loaded with nrips=0 but nrips is supported in hardware is to
> stuff NextRIP on event injection even if nrips=0, otherwise the guest is hosed.
> 
> > > > +	u64 next_rip;
> > > >   	u64 nested_cr3;
> > > >   	u64 virt_ext;
> > > >   	u32 clean;
> > > 
> > > I don't know why this struct has
> > > 
> > > 	u8 reserved_sw[32];
> > > 
> > > but presumably it's for padding, i.e. probably should be reduced to 24 bytes.
> > 
> > Apparently the "reserved_sw" field stores Hyper-V enlightenments state -
> > see commit 66c03a926f18 ("KVM: nSVM: Implement Enlightened MSR-Bitmap feature")
> > and nested_svm_vmrun_msrpm() in nested.c.
> 
> Argh, that's a terrible name.  Thanks for doing the homework, I was being lazy.


That was added around the commit 

1183646a67d01 ("KVM: SVM: hyper-v: Direct Virtual Flush support")

Seems to be used by HV to store 'struct hv_enlightenments',
but I don't know 100% if that is the only thing that can be stored
in this area.



Best regards,
	Maxim Levitsky


> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ