[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <094c2395-b1b3-d908-657c-9bd4144e40ac@redhat.com>
Date: Tue, 10 Nov 2020 09:50:43 +0100
From: Paolo Bonzini <pbonzini@...hat.com>
To: Borislav Petkov <bp@...en8.de>, "Luck, Tony" <tony.luck@...el.com>
Cc: Jim Mattson <jmattson@...gle.com>, Qian Cai <cai@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-tip-commits@...r.kernel.org"
<linux-tip-commits@...r.kernel.org>, x86 <x86@...nel.org>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>
Subject: Re: [PATCH] x86/mce: Check for hypervisor before enabling additional
error logging
On 10/11/20 07:31, Borislav Petkov wrote:
>>
>> + if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
>> + return;
>> +
> Frankly, I'm tired of wagging the dog because the tail can't. If
> qemu/kvm can't emulate a CPU model fully then it should ignore those
> unknown MSR accesses by default, i.e., that "ignore_msrs" functionality
> should be on by default I'd say...
>
> We certainly can't be sprinkling this check everytime the kernel tries
> to do something as basic as read an MSR.
You don't have to, also because it's wrong. Fortunately it's much
simpler than that:
1) ignore_msrs _cannot_ be on by default. You cannot know in advance
that for all non-architectural MSRs it's okay for them to read as zero
and eat writes. For some non-architectural MSR which never reads as
zero on real hardware, who knows that there isn't some code using the
contents of the MSR as a divisor, and causing a division by zero
exception with ignore_msrs=1?
2) it's not just KVM. _Any_ hypervisor is bound to have this issue for
some non-architectural MSRs. KVM just gets the flak because Linux CI
environments (for obvious reasons) use it more than they use Hyper-V or
ESXi or VirtualBox.
3) because of (1) and (2), the solution is very simple. If the MSR is
architectural, its absence is a KVM bug and we'll fix it in all stable
versions. If the MSR is not architectural (and 17Fh isn't; not only
it's not mentioned in the SDM, even Google is failing me), never ever
assume that the CPUID family/model/stepping implies a given MSR is
there, and just use rdmsr_safe/wrmsr_safe.
So, for this patch,
Nacked-by: Paolo Bonzini <pbonzini@...hat.com>
Paolo
Powered by blists - more mailing lists