linux-kernel - TTM huge page-faults WAS: Re: [RFC PATCH 1/2] x86: Don't let pgprot

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cace2653-447f-bcdc-2714-142d9dc85787@shipmail.org>
Date:   Wed, 11 Sep 2019 12:10:21 +0200
From:   Thomas Hellström (VMware) 
        <thomas_os@...pmail.org>
To:     "Koenig, Christian" <Christian.Koenig@....com>
Cc:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "pv-drivers@...are.com" <pv-drivers@...are.com>,
        Thomas Hellstrom <thellstrom@...are.com>,
        "dri-devel@...ts.freedesktop.org" <dri-devel@...ts.freedesktop.org>
Subject: TTM huge page-faults WAS: Re: [RFC PATCH 1/2] x86: Don't let
 pgprot_modify() change the page encryption bit

removing people that are probably not interested from CC
adding dri-devel

On 9/11/19 11:08 AM, Koenig, Christian wrote:
> Am 10.09.19 um 21:26 schrieb Thomas Hellström (VMware):
>> On 9/10/19 6:11 PM, Andy Lutomirski wrote:
>>>> On Sep 5, 2019, at 8:24 AM, Christoph Hellwig <hch@...radead.org>
>>>> wrote:
>>>>
>>>>> On Thu, Sep 05, 2019 at 05:21:24PM +0200, Thomas Hellström (VMware)
>>>>> wrote:
>>>>>> On 9/5/19 4:15 PM, Dave Hansen wrote:
>>>>>> Hi Thomas,
>>>>>>
>>>>>> Thanks for the second batch of patches!  These look much improved
>>>>>> on all
>>>>>> fronts.
>>>>> Yes, although the TTM functionality isn't in yet. Hopefully we
>>>>> won't have to
>>>>> bother you with those though, since this assumes TTM will be using
>>>>> the dma
>>>>> API.
>>>> Please take a look at dma_mmap_prepare and dma_mmap_fault in this
>>>> branch:
>>>>
>>>> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-mmap-improvements
>>>>
>>>> they should allow to fault dma api pages in the page fault handler.
>>>> But
>>>> this is totally hot off the press and not actually tested for the last
>>>> few patches.  Note that I've also included your two patches from this
>>>> series to handle SEV.
>>> I read that patch, and it seems like you’ve built in the assumption
>>> that all pages in the mapping use identical protection or, if not,
>>> that the same fake vma hack that TTM already has is used to fudge
>>> around it.  Could it be reworked slightly to avoid this?
>>>
>>> I wonder if it’s a mistake to put the encryption bits in vm_page_prot
>>> at all.
>>  From my POW, the encryption bits behave quite similar in behaviour to
>> the caching mode bits, and they're also in vm_page_prot. They're the
>> reason TTM needs to modify the page protection in the fault handler in
>> the first place.
>>
>> The problem seen in TTM is that we want to be able to change the
>> vm_page_prot from the fault handler, but it's problematic since we
>> have the mmap_sem typically only in read mode. Hence the fake vma
>> hack. From what I can tell it's reasonably well-behaved, since
>> pte_modify() skips the bits TTM updates, so mprotect() and mremap()
>> works OK. I think split_huge_pmd may run into trouble, but we don't
>> support it (yet) with TTM.
> Ah! I actually ran into this while implementing huge page support for
> TTM and never figured out why that doesn't work. Dropped CPU huge page
> support because of this.

By incident, I got slightly sidetracked the other day and started 
looking at this as well. Got to the point where I figured out all the 
hairy alignment issues and actually got huge_fault() calls, but never 
implemented the handler. I think that's definitely something worth 
having. Not sure it will work for IO memory, though, (split_huge_pmd 
will just skip non-page-backed memory) but if we only support VM_SHARED 
(non COW) vmas there's no reason to split the huge pmds anyway. 
Definitely something we should have IMO.

>> We could probably get away with a WRITE_ONCE() update of the
>> vm_page_prot before taking the page table lock since
>>
>> a) We're locking out all other writers.
>> b) We can't race with another fault to the same vma since we hold an
>> address space lock ("buffer object reservation")
>> c) When we need to update there are no valid page table entries in the
>> vma, since it only happens directly after mmap(), or after an
>> unmap_mapping_range() with the same address space lock. When another
>> reader (for example split_huge_pmd()) sees a valid page table entry,
>> it also sees the new page protection and things are fine.
> Yeah, that's exactly why I always wondered why we need this hack with
> the vma copy on the stack.
>
>> But that would really be a special case. To solve this properly we'd
>> probably need an additional lock to protect the vm_flags and
>> vm_page_prot, taken after mmap_sem and i_mmap_lock.
> Well we already have a special lock for this: The reservation object. So
> memory barriers etc should be in place and I also think we can just
> update the vm_page_prot on the fly.

I agree. This is needed for huge pages. We should make this change, and 
perhaps add the justification above as a comment.

/Thomas

> Christian.
>
>> /Thomas
>>
>>
>>
>>