lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 26 Jan 2022 16:42:47 +0300
From:   "Kirill A. Shutemov" <kirill@...temov.name>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     Khalid Aziz <khalid.aziz@...cle.com>, akpm@...ux-foundation.org,
        longpeng2@...wei.com, arnd@...db.de, dave.hansen@...ux.intel.com,
        david@...hat.com, rppt@...nel.org, surenb@...gle.com,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [RFC PATCH 0/6] Add support for shared PTEs across processes

On Wed, Jan 26, 2022 at 04:04:48AM +0000, Matthew Wilcox wrote:
> On Tue, Jan 25, 2022 at 06:59:50PM +0000, Matthew Wilcox wrote:
> > On Tue, Jan 25, 2022 at 09:57:05PM +0300, Kirill A. Shutemov wrote:
> > > On Tue, Jan 25, 2022 at 02:09:47PM +0000, Matthew Wilcox wrote:
> > > > > I think zero-API approach (plus madvise() hints to tweak it) is worth
> > > > > considering.
> > > > 
> > > > I think the zero-API approach actually misses out on a lot of
> > > > possibilities that the mshare() approach offers.  For example, mshare()
> > > > allows you to mmap() many small files in the shared region -- you
> > > > can't do that with zeroAPI.
> > > 
> > > Do you consider a use-case for many small files to be common? I would
> > > think that the main consumer of the feature to be mmap of huge files.
> > > And in this case zero enabling burden on userspace side sounds like a
> > > sweet deal.
> > 
> > mmap() of huge files is certainly the Oracle use-case.  With occasional
> > funny business like mprotect() of a single page in the middle of a 1GB
> > hugepage.
> 
> Bill and I were talking about this earlier and realised that this is
> the key point.  There's a requirement that when one process mprotects
> a page that it gets protected in all processes.  You can't do that
> without *some* API because that's different behaviour than any existing
> API would produce.

"hurr, durr, we are Oracle" :P

Sounds like a very niche requirement. I doubt there will more than single
digit user count for the feature. Maybe only the DB.

> So how about something like this ...
> 
> int mcreate(const char *name, int flags, mode_t mode);
> 
> creates a new mm_struct with a refcount of 2.  returns an fd (one
> of the two refcounts) and creates a name for it (inside msharefs,
> holds the other refcount).
> 
> You can then mmap() that fd to attach it to a chunk of your address
> space.  Once attached, you can start to populate it by calling
> mmap() and specifying an address inside the attached mm as the first
> argument to mmap().

That is not what mmap() would normally do to an existing mapping. So it
requires special treatment.

In general mmap() of a mm_struct scares me. I can't wrap my head around
implications.

Like how does it work on fork()?

How accounting works? What happens on OOM?

What prevents creating loops, like mapping a mm_struct inside itself?

What mremap()/munmap() do to such mapping? Will it affect mapping of
mm_struct or will it target mapping inside the mm_sturct?

Maybe it just didn't clicked for me, I donno.

> Maybe mcreate() is just a library call, and it's really a thin wrapper
> around open() that happens to know where msharefs is mounted.

-- 
 Kirill A. Shutemov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ