[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <57313f38-5b2b-e352-7502-1a3a70fa4ef1@redhat.com>
Date: Fri, 10 Dec 2021 23:41:14 +0100
From: Paolo Bonzini <pbonzini@...hat.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Vitaly Kuznetsov <vkuznets@...hat.com>,
Wanpeng Li <wanpengli@...cent.com>,
Jim Mattson <jmattson@...gle.com>,
Joerg Roedel <joro@...tes.org>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org,
Hou Wenlong <houwenlong93@...ux.alibaba.com>
Subject: Re: [PATCH] KVM: x86: Inject #UD on "unsupported" hypercall if
patching fails
On 12/10/21 23:29, Sean Christopherson wrote:
> Inject a #UD if patching in the correct hypercall fails, e.g. due to
> emulator_write_emulated() failing because RIP is mapped not-writable by
> the guest. The guest is likely doomed in any case, but observing a #UD
> in the guest is far friendlier to debug/triage than a !WRITABLE #PF with
> CR2 pointing at the RIP of the faulting instruction.
>
> Ideally, KVM wouldn't patch at all; it's the guest's responsibility to
> identify and use the correct hypercall instruction (VMCALL vs. VMMCALL).
> Sadly, older Linux kernels prior to commit c1118b3602c2 ("x86: kvm: use
> alternatives for VMCALL vs. VMMCALL if kernel text is read-only") do the
> wrong thing and blindly use VMCALL, i.e. removing the patching would
> break running VMs with older kernels.
>
> One could argue that KVM should be "fixed" to ignore guest paging
> protections instead of injecting #UD, but patching in the first place was
> a mistake as it was a hack-a-fix for a guest bug.
Sort of. I agree that patching is awful, but I'm not sure about
injecting #UD vs. just doing the hypercall; the original reason for the
patching was to allow Intel<->AMD cross-vendor migration to work somewhat.
That in turn promoted Linux's ill-conceived sloppiness of just using
vmcall, which lasted until commit c1118b3602c2.
> There are myriad fatal
> issues with KVM's patching:
>
> 1. Patches using an emulated guest write, which will fail if RIP is not
> mapped writable. This is the issue being mitigated.
>
> 2. Doesn't ensure the write is "atomic", e.g. a hypercall that splits a
> page boundary will be handled as two separate writes, which means
> that a partial, corrupted instruction can be observed by a vCPU.
Only the third bytes differs between VMCALL and VMMCALL so that's not
really a problem. (Apparently what happened is that Microsoft asked
Intel to use 0xc1 like AMD, and VMware asked AMD to use 0xd9 like Intel,
or something like that; and they ended up swapping opcodes. But this
may be an urban legend, no matter how plausible).
The big ones are 1 and 4.
Thanks,
Paolo
> 3. Doesn't serialize other CPU cores after updating the code stream.
>
> 4. Completely fails to account for the case where KVM is emulating due
> to invalid guest state with unrestricted_guest=0. Patching and
> retrying the instruction will result in vCPU getting stuck in an
> infinite loop.
>
> But, the "support" _so_ awful, especially #1, that there's practically
> zero chance that a modern guest kernel can rely on KVM to patch the guest.
> So, rather than proliferate KVM's bad behavior any further than the
> absolute minimum needed for backwards compatibility, just try to make it
> suck a little less.
Powered by blists - more mailing lists