lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fcfa8403-3151-41eb-4ac4-bbac55705626@oracle.com>
Date:   Tue, 11 Jul 2017 14:57:38 -0700
From:   Mike Kravetz <mike.kravetz@...cle.com>
To:     Andrea Arcangeli <aarcange@...hat.com>
Cc:     Michal Hocko <mhocko@...nel.org>, linux-mm@...ck.org,
        linux-api@...r.kernel.org, linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Aaron Lu <aaron.lu@...el.com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Vlastimil Babka <vbabka@...e.cz>
Subject: Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing
 mirroring functionality

On 07/11/2017 02:02 PM, Andrea Arcangeli wrote:
> On Tue, Jul 11, 2017 at 11:23:19AM -0700, Mike Kravetz wrote:
>> I was surprised as well when a JVM developer pointed this out.
>>
>> From the old e-mail thread, here is original use case:
>> shmget(IPC_PRIVATE, 31498240, 0x1c0|0600) = 11337732
>> shmat(11337732, 0, 0)                   = 0x40299000
>> shmctl(11337732, IPC_RMID, 0)           = 0
>> mremap(0x402a9000, 0, 65536, MREMAP_MAYMOVE|MREMAP_FIXED, 0) = 0
>> mremap(0x402a9000, 0, 65536, MREMAP_MAYMOVE|MREMAP_FIXED, 0x100000) = 0x100000
>>
>> The JVM team wants to do something similar.  They are using
>> mmap(MAP_ANONYMOUS|MAP_SHARED) to create the initial mapping instead
>> of shmget/shmat.  As Vlastimil mentioned previously, one would not
>> expect a shared mapping for parts of the JVM heap.  I am working
>> to get clarification from the JVM team.
> 
> Why don't they use memfd_create instead? That's made so that the fd is
> born anon unlinked so when the last reference is dropped all memory
> associated with it is automatically freed. No need of IC_RMID and then
> they can use mmap instead of mremap(len=0) to get a double map of it.

Wow!  I did not even know about memfd_create until you mentioned it.
That would certainly work for 'normal' pages.

> If they use mmap(MAP_ANONYMOUS|MAP_SHARED) it's not hugetlbfs, that
> would have been the only issue.
> 
> Using hugetlbfs for JVM wouldn't be really flexible, better they try
> to leverage THP on SHM or the hugetlbfs reservation gets in the way of
> efficient use of the unused memory for memory allocations that don't
> have a definitive size (i.e. JVM forks or more JVM are run in
> parallel).

Well, the JVM has had a config option for the use of hugetlbfs for quite
some time.  I assume they have already had to deal with these issues.

What prompted this discussion is that they want the mremap mirroring/
duplication functionality extended to support hugetlbfs.  This is pretty
straight forward.  But, I wanted to have a discussion about whether the
mremap(old_size == 0) functionality should be formally documented first.

Do note that if you actually create/mount a hugetlbfs filesystem and
use a fd in that filesystem you can get the desired functionality.  However,
they want to avoid this extra step if possible and use mmap(anon, hugetlb).

I'm guessing that if memfd_create supported hugetlbfs, that would also
meet their needs.  Any thoughts about extending memfd_create support to
hugetlbfs?  I can't think of any big issues.  In fact, 'under the covers'
there actually is a hugetlbfs file created for anon mappings.  However,
that is not exposed to the user.

>> Yes.  I think this should be a separate patch.  As mentioned earlier,
>> mremap today creates a new/additional private mapping if called in this
>> way with old_size == 0.  To me, this is a bug.
> 
> Kernel by sheer luck should stay stable, but the result is weird and
> it's unlikely intentional.

Yes, that is why I think it is a bug.  Not that kernel is unstable, but
rather the unintentional/unexpected result.

> memfd_create doesn't have such issue, the new mmap MAP_PRIVATE will
> get the file pages correctly after a new mmap (even if there were cows
> in the old MAP_PRIVATE mmap).
> 
>> One reason for the RFC was to determine if people thought we should:
>> 1) Just document the existing old_size == 0 functionality
>> 2) Create a more explicit interface such as a new mremap flag for this
>>    functionality
>>
>> I am waiting to see what direction people prefer before making any
>> man page updates.
> 
> I guess old_size == 0 would better be dropped if possible, if
> memfd_create fits perfectly your needs as I supposed above. If it's
> not dropped then it's not very far from allowing mmap of /proc/self/mm
> again (removed around so far as 2.3.x?).

Yes, in my google'ing it appears the first users of mremap(old_size == 0)
previously used mmap of /proc/self/mm.

If memfd_create can be extended to support hugetlbfs, then I might suggest
dropping the memfd_create(old_size == 0) support.  Just a thought.

-- 
Mike Kravetz

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ