[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e164d7f4-406e-eed8-37d7-753f790b7560@redhat.com>
Date: Wed, 26 Jan 2022 11:16:42 +0100
From: David Hildenbrand <david@...hat.com>
To: Matthew Wilcox <willy@...radead.org>,
"Kirill A. Shutemov" <kirill@...temov.name>
Cc: Khalid Aziz <khalid.aziz@...cle.com>, akpm@...ux-foundation.org,
longpeng2@...wei.com, arnd@...db.de, dave.hansen@...ux.intel.com,
rppt@...nel.org, surenb@...gle.com, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, Peter Xu <peterx@...hat.com>
Subject: Re: [RFC PATCH 0/6] Add support for shared PTEs across processes
On 26.01.22 05:04, Matthew Wilcox wrote:
> On Tue, Jan 25, 2022 at 06:59:50PM +0000, Matthew Wilcox wrote:
>> On Tue, Jan 25, 2022 at 09:57:05PM +0300, Kirill A. Shutemov wrote:
>>> On Tue, Jan 25, 2022 at 02:09:47PM +0000, Matthew Wilcox wrote:
>>>>> I think zero-API approach (plus madvise() hints to tweak it) is worth
>>>>> considering.
>>>>
>>>> I think the zero-API approach actually misses out on a lot of
>>>> possibilities that the mshare() approach offers. For example, mshare()
>>>> allows you to mmap() many small files in the shared region -- you
>>>> can't do that with zeroAPI.
>>>
>>> Do you consider a use-case for many small files to be common? I would
>>> think that the main consumer of the feature to be mmap of huge files.
>>> And in this case zero enabling burden on userspace side sounds like a
>>> sweet deal.
>>
>> mmap() of huge files is certainly the Oracle use-case. With occasional
>> funny business like mprotect() of a single page in the middle of a 1GB
>> hugepage.
>
> Bill and I were talking about this earlier and realised that this is
> the key point. There's a requirement that when one process mprotects
> a page that it gets protected in all processes. You can't do that
> without *some* API because that's different behaviour than any existing
> API would produce.
A while ago I talked with Peter about an extended uffd (here: WP)
mechanism that would work on fds instead of the process address space.
The rough idea would be to register the uffd (or however that would be
called) handler on an fd instead of a virtual address space of a single
process and write-protect pages in that fd. Once anybody would try
writing to such a protected range (write, mmap, ...), the uffd handler
would fire and user space could handle the event (-> unprotect). The
page cache would have to remember the uffd information ("wp using
uffd"). When (un)protecting pages using this mechanism, all page tables
mapping the page would have to be updated accordingly using the rmap. At
that point, we wouldn't care if it's a single page table (e.g., shared
similar to hugetlb) or simply multiple page tables.
It's a completely rough idea, I just wanted to mention it.
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists