lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 25 May 2020 08:27:04 +0300
From:   "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
To:     "Kirill A. Shutemov" <kirill@...temov.name>
Cc:     Dave Hansen <dave.hansen@...ux.intel.com>,
        Andy Lutomirski <luto@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Sean Christopherson <sean.j.christopherson@...el.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>,
        David Rientjes <rientjes@...gle.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Kees Cook <keescook@...omium.org>,
        Will Drewry <wad@...omium.org>,
        "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
        "Kleen, Andi" <andi.kleen@...el.com>, x86@...nel.org,
        kvm@...r.kernel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Mike Rapoport <rppt@...ux.ibm.com>,
        Alexandre Chartre <alexandre.chartre@...cle.com>,
        Marius Hillenbrand <mhillenb@...zon.de>
Subject: Re: [RFC 00/16] KVM protected memory extension

On Fri, May 22, 2020 at 03:51:58PM +0300, Kirill A. Shutemov wrote:
> == Background / Problem ==
> 
> There are a number of hardware features (MKTME, SEV) which protect guest
> memory from some unauthorized host access. The patchset proposes a purely
> software feature that mitigates some of the same host-side read-only
> attacks.

CC people who worked on the related patchsets.
 
> == What does this set mitigate? ==
> 
>  - Host kernel ”accidental” access to guest data (think speculation)
> 
>  - Host kernel induced access to guest data (write(fd, &guest_data_ptr, len))
> 
>  - Host userspace access to guest data (compromised qemu)
> 
> == What does this set NOT mitigate? ==
> 
>  - Full host kernel compromise.  Kernel will just map the pages again.
> 
>  - Hardware attacks
> 
> 
> The patchset is RFC-quality: it works but has known issues that must be
> addressed before it can be considered for applying.
> 
> We are looking for high-level feedback on the concept.  Some open
> questions:
> 
>  - This protects from some kernel and host userspace read-only attacks,
>    but does not place the host kernel outside the trust boundary. Is it
>    still valuable?
> 
>  - Can this approach be used to avoid cache-coherency problems with
>    hardware encryption schemes that repurpose physical bits?
> 
>  - The guest kernel must be modified for this to work.  Is that a deal
>    breaker, especially for public clouds?
> 
>  - Are the costs of removing pages from the direct map too high to be
>    feasible?
> 
> == Series Overview ==
> 
> The hardware features protect guest data by encrypting it and then
> ensuring that only the right guest can decrypt it.  This has the
> side-effect of making the kernel direct map and userspace mapping
> (QEMU et al) useless.  But, this teaches us something very useful:
> neither the kernel or userspace mappings are really necessary for normal
> guest operations.
> 
> Instead of using encryption, this series simply unmaps the memory. One
> advantage compared to allowing access to ciphertext is that it allows bad
> accesses to be caught instead of simply reading garbage.
> 
> Protection from physical attacks needs to be provided by some other means.
> On Intel platforms, (single-key) Total Memory Encryption (TME) provides
> mitigation against physical attacks, such as DIMM interposers sniffing
> memory bus traffic.
> 
> The patchset modifies both host and guest kernel. The guest OS must enable
> the feature via hypercall and mark any memory range that has to be shared
> with the host: DMA regions, bounce buffers, etc. SEV does this marking via a
> bit in the guest’s page table while this approach uses a hypercall.
> 
> For removing the userspace mapping, use a trick similar to what NUMA
> balancing does: convert memory that belongs to KVM memory slots to
> PROT_NONE: all existing entries converted to PROT_NONE with mprotect() and
> the newly faulted in pages get PROT_NONE from the updated vm_page_prot.
> The new VMA flag -- VM_KVM_PROTECTED -- indicates that the pages in the
> VMA must be treated in a special way in the GUP and fault paths. The flag
> allows GUP to return the page even though it is mapped with PROT_NONE, but
> only if the new GUP flag -- FOLL_KVM -- is specified. Any userspace access
> to the memory would result in SIGBUS. Any GUP access without FOLL_KVM
> would result in -EFAULT.
> 
> Any anonymous page faulted into the VM_KVM_PROTECTED VMA gets removed from
> the direct mapping with kernel_map_pages(). Note that kernel_map_pages() only
> flushes local TLB. I think it's a reasonable compromise between security and
> perfromance.
> 
> Zapping the PTE would bring the page back to the direct mapping after clearing.
> At least for now, we don't remove file-backed pages from the direct mapping.
> File-backed pages could be accessed via read/write syscalls. It adds
> complexity.
> 
> Occasionally, host kernel has to access guest memory that was not made
> shared by the guest. For instance, it happens for instruction emulation.
> Normally, it's done via copy_to/from_user() which would fail with -EFAULT
> now. We introduced a new pair of helpers: copy_to/from_guest(). The new
> helpers acquire the page via GUP, map it into kernel address space with
> kmap_atomic()-style mechanism and only then copy the data.
> 
> For some instruction emulation copying is not good enough: cmpxchg
> emulation has to have direct access to the guest memory. __kvm_map_gfn()
> is modified to accommodate the case.
> 
> The patchset is on top of v5.7-rc6 plus this patch:
> 
> https://lkml.kernel.org/r/20200402172507.2786-1-jimmyassarsson@gmail.com
> 
> == Open Issues ==
> 
> Unmapping the pages from direct mapping bring a few of issues that have
> not rectified yet:
> 
>  - Touching direct mapping leads to fragmentation. We need to be able to
>    recover from it. I have a buggy patch that aims at recovering 2M/1G page.
>    It has to be fixed and tested properly
> 
>  - Page migration and KSM is not supported yet.
> 
>  - Live migration of a guest would require a new flow. Not sure yet how it
>    would look like.
> 
>  - The feature interfere with NUMA balancing. Not sure yet if it's
>    possible to make them work together.
> 
>  - Guests have no mechanism to ensure that even a well-behaving host has
>    unmapped its private data.  With SEV, for instance, the guest only has
>    to trust the hardware to encrypt a page after the C bit is set in a
>    guest PTE.  A mechanism for a guest to query the host mapping state, or
>    to constantly assert the intent for a page to be Private would be
>    valuable.
-- 
 Kirill A. Shutemov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ