lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200409181823.00bcd14a@gandalf.local.home>
Date:   Thu, 9 Apr 2020 18:18:23 -0400
From:   Steven Rostedt <rostedt@...dmis.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Nadav Amit <nadav.amit@...il.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Christoph Hellwig <hch@...radead.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Sean Christopherson <sean.j.christopherson@...el.com>,
        mingo@...hat.com, bp@...en8.de, hpa@...or.com, x86@...nel.org,
        kenny@...ix.com, jeyu@...nel.org, rasmus.villemoes@...vas.dk,
        fenghua.yu@...el.com, xiaoyao.li@...el.com, thellstrom@...are.com,
        tony.luck@...el.com, gregkh@...uxfoundation.org, jannh@...gle.com,
        keescook@...omium.org, David.Laight@...lab.com,
        dcovelli@...are.com, mhiramat@...nel.org
Subject: Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation

On Thu, 09 Apr 2020 23:13:22 +0200
Thomas Gleixner <tglx@...utronix.de> wrote:

> Nadav Amit <nadav.amit@...il.com> writes:
> >> On Apr 9, 2020, at 1:56 AM, Peter Zijlstra <peterz@...radead.org> wrote:
> >> Speaking with my virt ignorance hat on, how impossible is it to provide
> >> generic/useful VMLAUNCH/VMRESUME wrappers?
> >> 
> >> Because a lot of what happens around VMEXIT/VMENTER is very much like
> >> the userspace entry crud, as per that series from Thomas that fixes all
> >> that. And surely we don't need various broken copies of that in all the
> >> out-of-tree hypervisors.
> >> 
> >> Also, I suppose if you have this, we no longer need to excempt CR2.  
> >
> > It depends on what you mean by “VMLAUNCH/VMRESUME”. If you only consider the
> > instructions themselves, as Sean did in vmx_vmenter() and vmx_vmexit(),
> > there is no problem. Even if you consider saving the general purpose
> > registers as done in __vmx_vcpu_run() - that’s relatively easy.  
> 
> __vmx_vcpu_run() is roughly the scope, but that wont work.
> 
> Looking at the vmmon source:
> 
> Task_Switch()
> 
>     1) Mask all APIC LVTs which have NMI delivery mode enabled, e.g. PERF
> 
>     2) Disable interrupts
> 
>     3) Disable PEBS
> 
>     4) Disable PT
> 
>     5) Load a magic IDT
> 
>        According to comments these are stubs to catch any exception which
>        happens while switching over.
> 
>     6) Write CR0 and CR4 directly which is "safe" as the the IDT is
>        redirected to the monitor stubs.
> 
>     7) VMXON()
> 
>     8) Invoke monitor on some magic page which switches CR3 and GDT and
>        clears CR4.PCIDE (at least thats what the comments claim)
> 
>        The monitor code is loaded from a binary only blob and that does
>        the actual vmlaunch/vmresume ...

>From what I understand (never looked at the code), is that this binary blob
is the same for Windows and Apple. It's basically its own operating system
that does all the work and vmmon is the way to switch to and from it. When
this blob gets an interrupt that it doesn't know about, it assumes it
belongs to the operating system its sharing the machine with and exits back
to it, whether that's Linux, Windows or OSX.

It's not too unlike what jailhouse does with its hypervisor, to take over
the machine and place the running Linux into its own "cell", except that it
will switch full control of the machine back to Linux.

-- Steve


> 
>        And as this runs with a completely different CR3 sharing that
>        code is impossible.
> 
>     When returning the above is undone in reverse order and any catched
>     exceptions / interrupts are replayed via "int $NR".
> 
> So it's pretty much the same mess as with vbox just different and
> binary. Oh well...
> 
> The "good" news is that it's not involved in any of the context tracking
> stuff so RCU wont ever be affected when a vmware vCPU runs. It's not
> pretty, but TBH I don't care.
> 
> Thanks,
> 
>         tglx
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ