lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e164d7f4-406e-eed8-37d7-753f790b7560@redhat.com>
Date:   Wed, 26 Jan 2022 11:16:42 +0100
From:   David Hildenbrand <david@...hat.com>
To:     Matthew Wilcox <willy@...radead.org>,
        "Kirill A. Shutemov" <kirill@...temov.name>
Cc:     Khalid Aziz <khalid.aziz@...cle.com>, akpm@...ux-foundation.org,
        longpeng2@...wei.com, arnd@...db.de, dave.hansen@...ux.intel.com,
        rppt@...nel.org, surenb@...gle.com, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, Peter Xu <peterx@...hat.com>
Subject: Re: [RFC PATCH 0/6] Add support for shared PTEs across processes

On 26.01.22 05:04, Matthew Wilcox wrote:
> On Tue, Jan 25, 2022 at 06:59:50PM +0000, Matthew Wilcox wrote:
>> On Tue, Jan 25, 2022 at 09:57:05PM +0300, Kirill A. Shutemov wrote:
>>> On Tue, Jan 25, 2022 at 02:09:47PM +0000, Matthew Wilcox wrote:
>>>>> I think zero-API approach (plus madvise() hints to tweak it) is worth
>>>>> considering.
>>>>
>>>> I think the zero-API approach actually misses out on a lot of
>>>> possibilities that the mshare() approach offers.  For example, mshare()
>>>> allows you to mmap() many small files in the shared region -- you
>>>> can't do that with zeroAPI.
>>>
>>> Do you consider a use-case for many small files to be common? I would
>>> think that the main consumer of the feature to be mmap of huge files.
>>> And in this case zero enabling burden on userspace side sounds like a
>>> sweet deal.
>>
>> mmap() of huge files is certainly the Oracle use-case.  With occasional
>> funny business like mprotect() of a single page in the middle of a 1GB
>> hugepage.
> 
> Bill and I were talking about this earlier and realised that this is
> the key point.  There's a requirement that when one process mprotects
> a page that it gets protected in all processes.  You can't do that
> without *some* API because that's different behaviour than any existing
> API would produce.

A while ago I talked with Peter about an extended uffd (here: WP)
mechanism that would work on fds instead of the process address space.

The rough idea would be to register the uffd (or however that would be
called) handler on an fd instead of a virtual address space of a single
process and write-protect pages in that fd. Once anybody would try
writing to such a protected range (write, mmap, ...), the uffd handler
would fire and user space could handle the event (-> unprotect). The
page cache would have to remember the uffd information ("wp using
uffd"). When (un)protecting pages using this mechanism, all page tables
mapping the page would have to be updated accordingly using the rmap. At
that point, we wouldn't care if it's a single page table (e.g., shared
similar to hugetlb) or simply multiple page tables.

It's a completely rough idea, I just wanted to mention it.

-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ