linux-kernel - Re: [RFC PATCH 1/2] x86: Don't let pgprot_modify() change the page encryption bit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrVKg=xjG5qyHbCY7P1H17v8LBV3FmQmqGKsPz_4qovFZQ@mail.gmail.com>
Date:   Wed, 11 Sep 2019 11:03:19 -0700
From:   Andy Lutomirski <luto@...nel.org>
To:     Thomas Hellström (VMware) 
        <thomas_os@...pmail.org>
Cc:     Andy Lutomirski <luto@...nel.org>,
        Christoph Hellwig <hch@...radead.org>,
        Dave Hansen <dave.hansen@...el.com>,
        LKML <linux-kernel@...r.kernel.org>, X86 ML <x86@...nel.org>,
        pv-drivers@...are.com, Thomas Hellstrom <thellstrom@...are.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        Christian König <christian.koenig@....com>,
        Marek Szyprowski <m.szyprowski@...sung.com>,
        Tom Lendacky <thomas.lendacky@....com>
Subject: Re: [RFC PATCH 1/2] x86: Don't let pgprot_modify() change the page
 encryption bit

On Wed, Sep 11, 2019 at 12:49 AM Thomas Hellström (VMware)
<thomas_os@...pmail.org> wrote:
>
> Hi, Andy.
>
> On 9/11/19 6:18 AM, Andy Lutomirski wrote:

> >
> > As a for-real example, take a look at arch/x86/entry/vdso/vma.c.  The
> > "vvar" VMA contains multiple pages that are backed by different types
> > of memory.  There's a page of ordinary kernel memory.  Historically
> > there was a page of genuine MMIO memory, but that's gone now.  If the
> > system is a SEV guest with pvclock enabled, then there's a page of
> > decrypted memory.   They all share a VMA, they're instantiated in
> > .fault, and there is no hackery involved.  Look at vvar_fault().
>
> So this is conceptually identical to TTM. The difference is that it uses
> vmf_insert_pfn_prot() instead of vmf_insert_mixed() with the vma hack.
> Had there been a vmf_insert_mixed_prot(), the hack in TTM wouldn't be
> needed. I guess providing a vmf_insert_mixed_prot() is a to-do for me to
> pick up.

:)

>
> Having said that, the code you point out is as fragile and suffers from
> the same shortcomings as TTM since
> a) Running things like split_huge_pmd() that takes the vm_page_prot and
> applies it to new PTEs will make things break, (although probably never
> applicable in this case).

Hmm.  There are no vvar huge pages, so this is okay.

I wonder how hard it would be to change the huge page splitting code
to copy the encryption and cacheability bits from the source entry
instead of getting them from vm_page_prot, at least in the cases
relevant to VM_MIXEDMAP and VM_PFNMAP.

> b) Running mprotect() on that VMA will only work if sme_me_mask is part
> of _PAGE_CHG_MASK (which is addressed in a reliable way in my recent
> patchset),  otherwise, the encryption setting will be overwritten.
>

Indeed.  Thanks for the fix!

>
> >> We could probably get away with a WRITE_ONCE() update of the
> >> vm_page_prot before taking the page table lock since
> >>
> >> a) We're locking out all other writers.
> >> b) We can't race with another fault to the same vma since we hold an
> >> address space lock ("buffer object reservation")
> >> c) When we need to update there are no valid page table entries in the
> >> vma, since it only happens directly after mmap(), or after an
> >> unmap_mapping_range() with the same address space lock. When another
> >> reader (for example split_huge_pmd()) sees a valid page table entry, it
> >> also sees the new page protection and things are fine.
> >>
> >> But that would really be a special case. To solve this properly we'd
> >> probably need an additional lock to protect the vm_flags and
> >> vm_page_prot, taken after mmap_sem and i_mmap_lock.
> >>
> > This is all horrible IMO.
>
> I'd prefer to call it fragile and potentially troublesome to maintain.

Fair enough.

>
> That distinction is important because if it ever comes to a choice
> between adding a new lock to protect vm_page_prot (and consequently slow
> down the whole vm system) and using the WRITE_ONCE solution in TTM, we
> should know what the implications are. As it turns out previous choices
> in this area actually seem to have opted for the lockless WRITE_ONCE /
> READ_ONCE / ptl solution. See __split_huge_pmd_locked() and
> vma_set_page_prot().

I think it would be even better if the whole thing could work without
ever writing to vm_page_prot.  This would be a requirement for vvar in
the unlikely event that the vvar vma ever supported splittable huge
pages.  Fortunately, that seems unlikely :)