linux-kernel - Re: [RFC 00/16] KVM protected memory extension

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <29c62691-0d50-8a02-5f43-761fa56ab551@oracle.com>
Date:   Mon, 25 May 2020 18:56:57 +0300
From:   Liran Alon <liran.alon@...cle.com>
To:     "Kirill A. Shutemov" <kirill@...temov.name>
Cc:     Dave Hansen <dave.hansen@...ux.intel.com>,
        Andy Lutomirski <luto@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Sean Christopherson <sean.j.christopherson@...el.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>,
        David Rientjes <rientjes@...gle.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Kees Cook <keescook@...omium.org>,
        Will Drewry <wad@...omium.org>,
        "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
        "Kleen, Andi" <andi.kleen@...el.com>, x86@...nel.org,
        kvm@...r.kernel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
Subject: Re: [RFC 00/16] KVM protected memory extension


On 25/05/2020 17:46, Kirill A. Shutemov wrote:
> On Mon, May 25, 2020 at 04:47:18PM +0300, Liran Alon wrote:
>> On 22/05/2020 15:51, Kirill A. Shutemov wrote:
>>> == Background / Problem ==
>>>
>>> There are a number of hardware features (MKTME, SEV) which protect guest
>>> memory from some unauthorized host access. The patchset proposes a purely
>>> software feature that mitigates some of the same host-side read-only
>>> attacks.
>>>
>>>
>>> == What does this set mitigate? ==
>>>
>>>    - Host kernel ”accidental” access to guest data (think speculation)
>> Just to clarify: This is any host kernel memory info-leak vulnerability. Not
>> just speculative execution memory info-leaks. Also architectural ones.
>>
>> In addition, note that removing guest data from host kernel VA space also
>> makes guest<->host memory exploits more difficult.
>> E.g. Guest cannot use already available memory buffer in kernel VA space for
>> ROP or placing valuable guest-controlled code/data in general.
>>
>>>    - Host kernel induced access to guest data (write(fd, &guest_data_ptr, len))
>>>
>>>    - Host userspace access to guest data (compromised qemu)
>> I don't quite understand what is the benefit of preventing userspace VMM
>> access to guest data while the host kernel can still access it.
> Let me clarify: the guest memory mapped into host userspace is not
> accessible by both host kernel and userspace. Host still has way to access
> it via a new interface: GUP(FOLL_KVM). The GUP will give you struct page
> that kernel has to map (temporarily) if need to access the data. So only
> blessed codepaths would know how to deal with the memory.
Yes, I understood that. I meant explicit host kernel access.
>
> It can help preventing some host->guest attack on the compromised host.
> Like if an VM has successfully attacked the host it cannot attack other
> VMs as easy.

We have mechanisms to sandbox the userspace VMM process for that.

You need to be more specific on what is the attack scenario you attempt 
to address
here that is not covered by existing mechanisms. i.e. Be crystal clear 
on the extra value
of the feature of not exposing guest data to userspace VMM.

>
> It would also help to protect against guest->host attack by removing one
> more places where the guest's data is mapped on the host.
Because guest have explicit interface to request which guest pages can 
be mapped in userspace VMM, the value of this is very small.

Guest already have ability to map guest controlled code/data in 
userspace VMM either via this interface or via forcing userspace VMM
to create various objects during device emulation handling. The only 
extra property this patch-series provides, is that only a
small portion of guest pages will be mapped to host userspace instead of 
all of it. Resulting in smaller regions for exploits that require
guessing a virtual address. But: (a) Userspace VMM device emulation may 
still allow guest to spray userspace heap with objects containing
guest controlled data. (b) How is userspace VMM suppose to limit which 
guest pages should not be mapped to userspace VMM even though guest have
explicitly requested them to be mapped? (E.g. Because they are valid DMA 
sources/targets for virtual devices or because it's vGPU frame-buffer).
>> QEMU is more easily compromised than the host kernel because it's
>> guest<->host attack surface is larger (E.g. Various device emulation).
>> But this compromise comes from the guest itself. Not other guests. In
>> contrast to host kernel attack surface, which an info-leak there can
>> be exploited from one guest to leak another guest data.
> Consider the case when unprivileged guest user exploits bug in a QEMU
> device emulation to gain access to data it cannot normally have access
> within the guest. With the feature it would able to see only other shared
> regions of guest memory such as DMA and IO buffers, but not the rest.
This is a scenario where an unpriviledged guest userspace have direct 
access to a virtual device
and is able to exploit a bug in device emulation handling such that it 
will allow it to compromise
the security *inside* the guest. i.e. Leak guest kernel data or other 
guest userspace processes data.

That's true. Good point. This is a very important missing argument from 
the cover-letter.

Now it's crystal clear on the trade-off considered here:
Is the extra complication and perf cost provided by the mechanism of 
this patch-series worth
to protect against the scenario of a userspace VMM vulnerability that 
may be accessible to unpriviledged
guest userspace process to leak other *in-guest* data that is not 
otherwise accessible to that process?

-Liran