linux-kernel - Re: [PATCH] x86/mm: Don't try to change poison pages to uncacheable in a guest

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200519050435.GA5081@linux.intel.com>
Date:   Mon, 18 May 2020 22:04:35 -0700
From:   Sean Christopherson <sean.j.christopherson@...el.com>
To:     Borislav Petkov <bp@...en8.de>
Cc:     "Luck, Tony" <tony.luck@...el.com>, Jue Wang <juew@...gle.com>,
        "Williams, Dan J" <dan.j.williams@...el.com>,
        "x86@...nel.org" <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] x86/mm: Don't try to change poison pages to uncacheable
 in a guest

On Mon, May 18, 2020 at 06:55:00PM +0200, Borislav Petkov wrote:
> On Mon, May 18, 2020 at 08:36:25AM -0700, Luck, Tony wrote:
> > The VMM gets the page fault (because the unmapping of the guest
> > physical address is at the VMM EPT level).  The VMM can't map a new
> > page into that guest physical address because it has no way to
> > replace the contents of the old page.  The VMM could pass the #PF
> > to the guest, but that would just confuse the guest (its page tables
> > all say that the page is still valid). In this particular case the
> > page is part of the 1:1 kernel map. So the kernel will OOPS (I think).
> 
> ...
> 
> > PLease explain how a guest (that doesn't even know that it is a guest)
> > is going to figure out that the EPT tables (that it has no way to access)
> > have marked this page invalid in guest physical address space.
> 
> So somewhere BUS_MCEERR_AR was mentioned. So I'm assuming the error
> severity was "action required". What does happen in the kernel, on
> baremetal, with an AR error in kernel space, i.e., kernel memory?
> 
> If we can't fixup the exception, we die.
> 
> So why should the guest behave any differently?
> 
> Now, if you want for the guest to be more "robust" and handle that
> thing, fine. But then you'd need an explicit way to tell the guest
> kernel: "you've just had an MCE and I unmapped the page" so that the
> guest kernel can figure out what do to. Even if it means, to panic.
> 
> I.e., signal in an explicit way that EPT violation Jue is talking about
> in the other mail.

Well, technically the CLFUSH thing is a KVM emulation bug, but it sounds
like that's a moot point since the pmem-enabled guest will make real
accesses to the poisoned page shortly thereafter.  E.g. teaching KVM to
eat the -EHWPOISON on CLFLUSH would only postpone the guest's death.

As for how the second #MC occurs, on the EPT violation, KVM does a gup() to
translate the virtual address to a pfn (KVM maintains a simple GPA->HVA
lookup).  gup() returns -EHWPOISON for the poisoned page, which KVM
redirects into a BUS_MCEERR_AR.  The userspace VMM, e.g. Qemu, sees the
BUS_MCEERR_AR and sends it back into the guest as a virtual #MC.

> You can inject a #PF or better yet the *first* MCE which is being
> injected should say with a bit somehwere "I unmapped the address in
> m->addr". So that the guest kernel can handle that properly and know
> what *exactly* it is getting an MCE for.
> 
> What I don't like is the "am I running as a guest" check. Because
> someone else would come later and say, err, I'm not virtualizing this
> portion of MCA either, lemme add another "am I guest" check.
> 
> Sure, it is a lot easier but when stuff like that starts spreading
> around in the MCE code, then we can just as well disable MCE when
> virtualized altogether. It would be a lot easier for everybody.