lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <13a115b6-2f75-e399-265e-2e6c73c09e9a@linux.vnet.ibm.com>
Date:   Mon, 24 Apr 2017 17:47:43 +0200
From:   Laurent Dufour <ldufour@...ux.vnet.ibm.com>
To:     Andi Kleen <andi@...stfloor.org>
Cc:     linux-mm@...ck.org, Davidlohr Bueso <dave@...olabs.net>,
        akpm@...ux-foundation.org, Jan Kara <jack@...e.cz>,
        "Kirill A . Shutemov" <kirill@...temov.name>,
        Michal Hocko <mhocko@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Mel Gorman <mgorman@...hsingularity.net>,
        haren@...ux.vnet.ibm.com, aneesh.kumar@...ux.vnet.ibm.com,
        khandual@...ux.vnet.ibm.com, Paul.McKenney@...ibm.com,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC 4/4] Change mmap_sem to range lock

On 21/04/2017 01:36, Andi Kleen wrote:
> Laurent Dufour <ldufour@...ux.vnet.ibm.com> writes:
> 
>> [resent this patch which seems to have not reached the mailing lists]
>>
>> Change the mmap_sem to a range lock to allow finer grain locking on
>> the memory layout of a task.
>>
>> This patch rename mmap_sem into mmap_rw_tree to avoid confusion and
>> replace any locking (read or write) by complete range locking.  So
>> there is no functional change except in the way the underlying locking
>> is achieved.
>>
>> Currently, this patch only supports x86 and PowerPc architectures,
>> furthermore it should break the build of any others.
> 
> Thanks for working on this.
> 
> However as commented before I think the first step to make progress here
> is a description of everything mmap_sem protects.

Hi Andy,

I looked for the write mmap_sem locking in x86 and ppc64 architectures,
here is what I found:

mmap_sem protects
 vdso mapping
 VMA layout changes
 VMA cache
 Page protection/layout
 Changes to mmu notifier chain
 mmap_sem is used to serialize khugepaged's access
 mmap_sem is used to serialize ksm's access
 protection keys (pkey_alloc()...)

Calls to
 get_unmap_area()
 do_mmap()
 do_mmap_pgoff()
 do_munmap()
 get_user_pages()
 put_page()
 set_page_dirty_lock()
 find_vma()
 find_vma_intersection()
 alloc_empty_pages()
 insert_vm_struct()
 get_mm_rss()
 uprobe_consumer->filter() (currently only uprobe_perf_filter())
 _install_special_mapping()
 pmdp_collapse_flush()
 do_swap_page()
 do_brk()
 __split_vma()
 mremap_to()
 vma_to_resize()
 vma_adjust()

MM fields
   pinned_vm
   stack_vm
   total_vm
   locked_vm
   start_stack
   start_code
   end_code
   start_data
   start_brk
   bd_addr
   mm_users
   core_state
   context.vdso_*
   def_flags
   mmu_notifier_mm

VMA fields
    vm_private_data
    vm_flags
    vm_page_prot
    vm_file
    vm_pgoff
    vm_policy


Userfaultfd has not been looked in details yet.
dup_mmap() locks the oldmm in write mode when copying it, is it necessary ?

> Surely the init full case could be done shorter with some wrapper
> that combines the init_full and lock operation?

Yes that doable, I wrote this like that, because the range should be
initialized based on the on going operation, so having an explicit init
operation is making this more explicit.

> Then it would be likely a simple search'n'replace to move the
> whole tree in one atomic step to the new wrappers.
> Initially they could be just defined to use rwsems too to
> not change anything at all.
> 
> It would be a good idea to merge such a patch as quickly
> as possible beause it will be a nightmare to maintain
> longer term.
> 
> Then you could add a config to use a range lock through
> the wrappers.

I agree, I should try a way to make that patch activated through a
CONFIG_value, but there is a the additional range value that make it
more complex to achieve. I'll try to figure out a way to do that.

> Then after that you could add real ranges step by step,
> after doing the proper analysis.

That's the biggest part of the job.
I'm also wondering if a dedicated lock/sem should be introduced to
protect the VMA cache and the VMA list, since the range itself will not
protect against change while walking the VMA list.

Please advise.

Cheers,
Laurent.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ