lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201110095615.GB9450@nazgul.tnic>
Date:   Tue, 10 Nov 2020 10:56:15 +0100
From:   Borislav Petkov <bp@...en8.de>
To:     Paolo Bonzini <pbonzini@...hat.com>
Cc:     "Luck, Tony" <tony.luck@...el.com>,
        Jim Mattson <jmattson@...gle.com>, Qian Cai <cai@...hat.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-tip-commits@...r.kernel.org" 
        <linux-tip-commits@...r.kernel.org>, x86 <x86@...nel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>
Subject: Re: [PATCH] x86/mce: Check for hypervisor before enabling additional
 error logging

On Tue, Nov 10, 2020 at 09:50:43AM +0100, Paolo Bonzini wrote:
> 1) ignore_msrs _cannot_ be on by default.  You cannot know in advance that
> for all non-architectural MSRs it's okay for them to read as zero and eat
> writes.  For some non-architectural MSR which never reads as zero on real
> hardware, who knows that there isn't some code using the contents of the MSR
> as a divisor, and causing a division by zero exception with ignore_msrs=1?

So if you're emulating a certain type of hardware - say a certain CPU
model - then what are you saying? That you're emulating it but not
really all of it, just some bits?

Because this is what happens - the kernel checks that it runs on a
certain CPU type and this tells it that those MSRs are there. But then
comes virt and throws all assumptions out.

So if it emulates a CPU model and the kernel tries to access those MSRs,
then the HV should ignore those MSR accesses if it doesn't know about
them. Why should the kernel change everytime some tool or virtualization
has shortcomings?

> 2) it's not just KVM.  _Any_ hypervisor is bound to have this issue for some
> non-architectural MSRs.  KVM just gets the flak because Linux CI
> environments (for obvious reasons) use it more than they use Hyper-V or ESXi
> or VirtualBox.

It's not flak - I'm trying to find a solution which is workable for
both. The kernel is not at fault here.

> 3) because of (1) and (2), the solution is very simple.  If the MSR is
> architectural, its absence is a KVM bug and we'll fix it in all stable
> versions.  If the MSR is not architectural (and 17Fh isn't; not only it's
> not mentioned in the SDM,

It is mentioned in the SDM.

> even Google is failing me), never ever assume that the CPUID
> family/model/stepping implies a given MSR is there, and just use
> rdmsr_safe/wrmsr_safe.

Yes, we don't have a choice, as always. ;-\

But maybe we should have a choice and maybe qemu/kvm should have a way
to ignore certain MSRs for certain CPU types, regardless of them being
architectural or not.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ