linux-kernel - Re: [RFC v3 PATCH 4/5] mm: mmap: zap pages with read mmap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <06df816f-b8b7-f6c0-3710-baad99fb3213@linux.alibaba.com>
Date:   Mon, 2 Jul 2018 17:01:12 -0700
From:   Yang Shi <yang.shi@...ux.alibaba.com>
To:     Andrew Morton <akpm@...ux-foundation.org>
Cc:     mhocko@...nel.org, willy@...radead.org, ldufour@...ux.vnet.ibm.com,
        peterz@...radead.org, mingo@...hat.com, acme@...nel.org,
        alexander.shishkin@...ux.intel.com, jolsa@...hat.com,
        namhyung@...nel.org, tglx@...utronix.de, hpa@...or.com,
        linux-mm@...ck.org, x86@...nel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC v3 PATCH 4/5] mm: mmap: zap pages with read mmap_sem for
 large mapping



On 6/29/18 9:26 PM, Yang Shi wrote:
>
>
> On 6/29/18 8:15 PM, Andrew Morton wrote:
>> On Fri, 29 Jun 2018 19:28:15 -0700 Yang Shi 
>> <yang.shi@...ux.alibaba.com> wrote:
>>
>>>
>>>> we're adding a bunch of code to 32-bit kernels which will never be
>>>> executed.
>>>>
>>>> I'm thinking it would be better to be much more explicit with "#ifdef
>>>> CONFIG_64BIT" in this code, rather than relying upon the above magic.
>>>>
>>>> But I tend to think that the fact that we haven't solved anything on
>>>> locked vmas or on uprobed mappings is a shostopper for the whole
>>>> approach :(
>>> I agree it is not that perfect. But, it still could improve the most 
>>> use
>>> cases.
>> Well, those unaddressed usecases will need to be fixed at some point.
>
> Yes, definitely.
>
>> What's our plan for that?
>
> As I mentioned in the earlier email, locked and hugetlb cases might be 
> able to be solved by separating vm_flags update and actual unmap. I 
> will look into it further later.

By looking into this further， I think both mlocked and hugetlb vmas can 
be handled.

For mlocked vmas, it is easy since we acquires write mmap_sem before 
unmapping, so VM_LOCK flags can be cleared here then unmap, just like 
what the regular path does.

For hugetlb vmas, the VM_MAYSHARE flag is just checked by 
huge_pmd_share() in hugetlb_fault()->huge_pte_alloc(), another call site 
is dup_mm()->copy_page_range()->copy_hugetlb_page_range(), we don't care 
this call chain in this case.

So we may expand VM_DEAD to hugetlb_fault().  Michal suggested to check 
VM_DEAD in check_stable_address_space(), so it would be called in 
hugetlb_fault() too (not in current code), then the page fault handler 
would bail out before huge_pte_alloc() is called.

With this trick, we don't have to care about when the vm_flags is 
updated, we can unmap hugetlb vmas in read mmap_sem critical section, 
then update the vm_flags with write mmap_sem held or before the unmap.

Yang

>
> From my point of view, uprobe mapping sounds not that vital.
>
>>
>> Would one of your earlier designs have addressed all usecases? I
>> expect the dumb unmap-a-little-bit-at-a-time approach would have?
>
> Yes. The v1 design does unmap with holding write map_sem. So, the 
> vm_flags update is not a problem.
>
> Thanks,
> Yang
>
>>
>>> For the locked vmas and hugetlb vmas, unmapping operations need modify
>>> vm_flags. But, I'm wondering we might be able to separate unmap and
>>> vm_flags update. Because we know they will be unmapped right away, the
>>> vm_flags might be able to be updated in write mmap_sem critical section
>>> before the actual unmap is called or after it. This is just off the top
>>> of my head.
>>>
>>> For uprobed mappings, I'm not sure how vital it is to this case.
>>>
>>> Thanks,
>>> Yang
>>>
>