[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220126134247.fadtwbvyknh3ejpe@box.shutemov.name>
Date: Wed, 26 Jan 2022 16:42:47 +0300
From: "Kirill A. Shutemov" <kirill@...temov.name>
To: Matthew Wilcox <willy@...radead.org>
Cc: Khalid Aziz <khalid.aziz@...cle.com>, akpm@...ux-foundation.org,
longpeng2@...wei.com, arnd@...db.de, dave.hansen@...ux.intel.com,
david@...hat.com, rppt@...nel.org, surenb@...gle.com,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [RFC PATCH 0/6] Add support for shared PTEs across processes
On Wed, Jan 26, 2022 at 04:04:48AM +0000, Matthew Wilcox wrote:
> On Tue, Jan 25, 2022 at 06:59:50PM +0000, Matthew Wilcox wrote:
> > On Tue, Jan 25, 2022 at 09:57:05PM +0300, Kirill A. Shutemov wrote:
> > > On Tue, Jan 25, 2022 at 02:09:47PM +0000, Matthew Wilcox wrote:
> > > > > I think zero-API approach (plus madvise() hints to tweak it) is worth
> > > > > considering.
> > > >
> > > > I think the zero-API approach actually misses out on a lot of
> > > > possibilities that the mshare() approach offers. For example, mshare()
> > > > allows you to mmap() many small files in the shared region -- you
> > > > can't do that with zeroAPI.
> > >
> > > Do you consider a use-case for many small files to be common? I would
> > > think that the main consumer of the feature to be mmap of huge files.
> > > And in this case zero enabling burden on userspace side sounds like a
> > > sweet deal.
> >
> > mmap() of huge files is certainly the Oracle use-case. With occasional
> > funny business like mprotect() of a single page in the middle of a 1GB
> > hugepage.
>
> Bill and I were talking about this earlier and realised that this is
> the key point. There's a requirement that when one process mprotects
> a page that it gets protected in all processes. You can't do that
> without *some* API because that's different behaviour than any existing
> API would produce.
"hurr, durr, we are Oracle" :P
Sounds like a very niche requirement. I doubt there will more than single
digit user count for the feature. Maybe only the DB.
> So how about something like this ...
>
> int mcreate(const char *name, int flags, mode_t mode);
>
> creates a new mm_struct with a refcount of 2. returns an fd (one
> of the two refcounts) and creates a name for it (inside msharefs,
> holds the other refcount).
>
> You can then mmap() that fd to attach it to a chunk of your address
> space. Once attached, you can start to populate it by calling
> mmap() and specifying an address inside the attached mm as the first
> argument to mmap().
That is not what mmap() would normally do to an existing mapping. So it
requires special treatment.
In general mmap() of a mm_struct scares me. I can't wrap my head around
implications.
Like how does it work on fork()?
How accounting works? What happens on OOM?
What prevents creating loops, like mapping a mm_struct inside itself?
What mremap()/munmap() do to such mapping? Will it affect mapping of
mm_struct or will it target mapping inside the mm_sturct?
Maybe it just didn't clicked for me, I donno.
> Maybe mcreate() is just a library call, and it's really a thin wrapper
> around open() that happens to know where msharefs is mounted.
--
Kirill A. Shutemov
Powered by blists - more mailing lists