lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b34ded1e11154eabbce07618bf0a6676@huawei.com>
Date:   Sat, 22 Jan 2022 01:39:46 +0000
From:   "Longpeng (Mike, Cloud Infrastructure Service Product Dept.)" 
        <longpeng2@...wei.com>
To:     Khalid Aziz <khalid.aziz@...cle.com>,
        Matthew Wilcox <willy@...radead.org>,
        Barry Song <21cnbao@...il.com>
CC:     Andrew Morton <akpm@...ux-foundation.org>,
        Arnd Bergmann <arnd@...db.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        David Hildenbrand <david@...hat.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>, Mike Rapoport <rppt@...nel.org>,
        Suren Baghdasaryan <surenb@...gle.com>
Subject: RE: [RFC PATCH 0/6] Add support for shared PTEs across processes



> -----Original Message-----
> From: Khalid Aziz [mailto:khalid.aziz@...cle.com]
> Sent: Saturday, January 22, 2022 12:42 AM
> To: Matthew Wilcox <willy@...radead.org>; Barry Song <21cnbao@...il.com>
> Cc: Andrew Morton <akpm@...ux-foundation.org>; Arnd Bergmann <arnd@...db.de>;
> Dave Hansen <dave.hansen@...ux.intel.com>; David Hildenbrand
> <david@...hat.com>; LKML <linux-kernel@...r.kernel.org>; Linux-MM
> <linux-mm@...ck.org>; Longpeng (Mike, Cloud Infrastructure Service Product
> Dept.) <longpeng2@...wei.com>; Mike Rapoport <rppt@...nel.org>; Suren
> Baghdasaryan <surenb@...gle.com>
> Subject: Re: [RFC PATCH 0/6] Add support for shared PTEs across processes
> 
> On 1/21/22 07:47, Matthew Wilcox wrote:
> > On Fri, Jan 21, 2022 at 08:35:17PM +1300, Barry Song wrote:
> >> On Fri, Jan 21, 2022 at 3:13 PM Matthew Wilcox <willy@...radead.org> wrote:
> >>> On Fri, Jan 21, 2022 at 09:08:06AM +0800, Barry Song wrote:
> >>>>> A file under /sys/fs/mshare can be opened and read from. A read from
> >>>>> this file returns two long values - (1) starting address, and (2)
> >>>>> size of the mshare'd region.
> >>>>>
> >>>>> --
> >>>>> int mshare_unlink(char *name)
> >>>>>
> >>>>> A shared address range created by mshare() can be destroyed using
> >>>>> mshare_unlink() which removes the  shared named object. Once all
> >>>>> processes have unmapped the shared object, the shared address range
> >>>>> references are de-allocated and destroyed.
> >>>>
> >>>>> mshare_unlink() returns 0 on success or -1 on error.
> >>>>
> >>>> I am still struggling with the user scenarios of these new APIs. This patch
> >>>> supposes multiple processes will have same virtual address for the shared
> >>>> area? How can this be guaranteed while different processes can map different
> >>>> stack, heap, libraries, files?
> >>>
> >>> The two processes choose to share a chunk of their address space.
> >>> They can map anything they like in that shared area, and then also
> >>> anything they like in the areas that aren't shared.  They can choose
> >>> for that shared area to have the same address in both processes
> >>> or different locations in each process.
> >>>
> >>> If two processes want to put a shared library in that shared address
> >>> space, that should work.  They probably would need to agree to use
> >>> the same virtual address for the shared page tables for that to work.
> >>
> >> we are depending on an elf loader and ld to map the library
> >> dynamically , so hardly
> >> can we find a chance in users' code to call mshare() to map libraries
> >> in application
> >> level?
> >
> > If somebody wants to modify ld.so to take advantage of mshare(), they
> > could.  That wasn't our primary motivation here, so if it turns out to
> > not work for that usecase, well, that's a shame.
> >
> >>> Think of this like hugetlbfs, only instead of sharing hugetlbfs
> >>> memory, you can share _anything_ that's mmapable.
> >>
> >> yep, we can call mshare() on any kind of memory. for example, if multiple
> >> processes use SYSV shmem, posix shmem or mmap the same file. but
> >> it seems it is more sensible to let kernel do it automatically rather than
> >> depending on calling mshare() from users? It is difficult for users to
> >> decide which areas should be applied mshare(). users might want to call
> >> mshare() for all shared areas to save memory coming from duplicated PTEs?
> >> unlike SYSV shmem and POSIX shmem which are a feature for inter-processes
> >> communications,  mshare() looks not like a feature for applications,
> >> but like a feature
> >> for the whole system level? why would applications have to call something
> which
> >> doesn't directly help them? without mshare(), those applications
> >> will still work without any problem, right? is there anything in
> >> mshare() which is
> >> a must-have for applications? or mshare() is only a suggestion from
> applications
> >> like madvise()?
> >
> > Our use case is that we have some very large files stored on persistent
> > memory which we want to mmap in thousands of processes.  So the first
> > one shares a chunk of its address space and mmaps all the files into
> > that chunk of address space.  Subsequent processes find that a suitable
> > address space already exists and use it, sharing the page tables and
> > avoiding the calls to mmap.
> >
> > Sharing page tables is akin to running multiple threads in a single
> > address space; except that only part of the address space is the same.
> > There does need to be a certain amount of trust between the processes
> > sharing the address space.  You don't want to do it to an unsuspecting
> > process.
> >
> 
> Hello Barry,
> 
> mshare() is really meant for sharing data across unrelated processes by sharing
> address space explicitly and hence
> opt-in is required. As Matthew said, the processes sharing this virtual address
> space need to have a level of trust.
> Permissions on the msharefs files control who can access this shared address
> space. It is possible to adapt this
> mechanism to share stack, libraries etc but that is not the intent. This feature
> will be used by applications that share
> data with multiple processes using shared mapping normally and it helps them
> avoid the overhead of large number of
> duplicated PTEs which consume memory. This extra memory consumed by PTEs reduces
> amount of memory available for
> applications and can result in out-of-memory condition. An example from the patch
> 0/6:
> 
> "On a database server with 300GB SGA, a system crash was seen with
> out-of-memory condition when 1500+ clients tried to share this SGA
> even though the system had 512GB of memory. On this server, in the
> worst case scenario of all 1500 processes mapping every page from
> SGA would have required 878GB+ for just the PTEs. If these PTEs
> could be shared, amount of memory saved is very significant."
> 

The memory overhead of PTEs would be significantly saved if we use
hugetlbfs in this case, but why not?

> --
> Khalid

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ