[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87imi8pdl9.fsf@nanos.tec.linutronix.de>
Date: Thu, 09 Apr 2020 23:13:22 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Nadav Amit <nadav.amit@...il.com>,
Peter Zijlstra <peterz@...radead.org>
Cc: Paolo Bonzini <pbonzini@...hat.com>,
Christoph Hellwig <hch@...radead.org>,
Steven Rostedt <rostedt@...dmis.org>,
LKML <linux-kernel@...r.kernel.org>,
Sean Christopherson <sean.j.christopherson@...el.com>,
mingo@...hat.com, bp@...en8.de, hpa@...or.com, x86@...nel.org,
kenny@...ix.com, jeyu@...nel.org, rasmus.villemoes@...vas.dk,
fenghua.yu@...el.com, xiaoyao.li@...el.com, thellstrom@...are.com,
tony.luck@...el.com, gregkh@...uxfoundation.org, jannh@...gle.com,
keescook@...omium.org, David.Laight@...lab.com,
dcovelli@...are.com, mhiramat@...nel.org
Subject: Re: [PATCH 4/4] x86,module: Detect CRn and DRn manipulation
Nadav Amit <nadav.amit@...il.com> writes:
>> On Apr 9, 2020, at 1:56 AM, Peter Zijlstra <peterz@...radead.org> wrote:
>> Speaking with my virt ignorance hat on, how impossible is it to provide
>> generic/useful VMLAUNCH/VMRESUME wrappers?
>>
>> Because a lot of what happens around VMEXIT/VMENTER is very much like
>> the userspace entry crud, as per that series from Thomas that fixes all
>> that. And surely we don't need various broken copies of that in all the
>> out-of-tree hypervisors.
>>
>> Also, I suppose if you have this, we no longer need to excempt CR2.
>
> It depends on what you mean by “VMLAUNCH/VMRESUME”. If you only consider the
> instructions themselves, as Sean did in vmx_vmenter() and vmx_vmexit(),
> there is no problem. Even if you consider saving the general purpose
> registers as done in __vmx_vcpu_run() - that’s relatively easy.
__vmx_vcpu_run() is roughly the scope, but that wont work.
Looking at the vmmon source:
Task_Switch()
1) Mask all APIC LVTs which have NMI delivery mode enabled, e.g. PERF
2) Disable interrupts
3) Disable PEBS
4) Disable PT
5) Load a magic IDT
According to comments these are stubs to catch any exception which
happens while switching over.
6) Write CR0 and CR4 directly which is "safe" as the the IDT is
redirected to the monitor stubs.
7) VMXON()
8) Invoke monitor on some magic page which switches CR3 and GDT and
clears CR4.PCIDE (at least thats what the comments claim)
The monitor code is loaded from a binary only blob and that does
the actual vmlaunch/vmresume ...
And as this runs with a completely different CR3 sharing that
code is impossible.
When returning the above is undone in reverse order and any catched
exceptions / interrupts are replayed via "int $NR".
So it's pretty much the same mess as with vbox just different and
binary. Oh well...
The "good" news is that it's not involved in any of the context tracking
stuff so RCU wont ever be affected when a vmware vCPU runs. It's not
pretty, but TBH I don't care.
Thanks,
tglx
Powered by blists - more mailing lists