lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <64d0da08-6ffd-4bce-bc66-5097913937b4@kernel.org>
Date:   Sat, 21 May 2022 15:19:24 -0700
From:   Andy Lutomirski <luto@...nel.org>
To:     Matthew Wilcox <willy@...radead.org>,
        David Hildenbrand <david@...hat.com>
Cc:     Chih-En Lin <shiyn.lin@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Christian Brauner <brauner@...nel.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        William Kucharski <william.kucharski@...cle.com>,
        John Hubbard <jhubbard@...dia.com>,
        Yunsheng Lin <linyunsheng@...wei.com>,
        Arnd Bergmann <arnd@...db.de>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Colin Cross <ccross@...gle.com>,
        Feng Tang <feng.tang@...el.com>,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        Mike Rapoport <rppt@...nel.org>,
        Geert Uytterhoeven <geert@...ux-m68k.org>,
        Anshuman Khandual <anshuman.khandual@....com>,
        "Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>,
        Daniel Axtens <dja@...ens.net>,
        Jonathan Marek <jonathan@...ek.ca>,
        Christophe Leroy <christophe.leroy@...roup.eu>,
        Pasha Tatashin <pasha.tatashin@...een.com>,
        Peter Xu <peterx@...hat.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Fenghua Yu <fenghua.yu@...el.com>,
        linux-kernel@...r.kernel.org, Kaiyang Zhao <zhao776@...due.edu>,
        Huichun Feng <foxhoundsk.tw@...il.com>,
        Jim Huang <jserv.tw@...il.com>
Subject: Re: [RFC PATCH 0/6] Introduce Copy-On-Write to Page Table

On 5/21/22 13:12, Matthew Wilcox wrote:
> On Sat, May 21, 2022 at 06:07:27PM +0200, David Hildenbrand wrote:
>> I'm missing the most important point: why do we care and why should we
>> care to make our COW/fork implementation even more complicated?
>>
>> Yes, we might save some page tables and we might reduce the fork() time,
>> however, which specific workload really benefits from this and why do we
>> really care about that workload? Without even hearing about an example
>> user in this cover letter (unless I missed it), I naturally wonder about
>> relevance in practice.
> 
> As I get older (and crankier), I get less convinced that fork() is
> really the right solution for implementing system().  I feel that a
> better model is to create a process with zero threads, but have an fd
> to it.  Then manipulate the child process through its fd (eg mmap
> ld.so, open new fds in that process's fdtable, etc).  Closing the fd
> launches a new thread in the process (ensuring nobody has an fd to a
> running process, particularly one which is setuid).

Heh, I learned serious programming on Windows, and I thought fork() was 
entertaining, cool, and a bad idea when I first learned about it.  (I 
admit I did think the fact that POSIX fork and exec had many fewer 
arguments than CreateProcess was a good thing.)  Don't even get me 
started on setuid -- if I had my way, distros would set NO_NEW_PRIVS on 
boot for the entire system.

I can see a rather different use for this type of shared-pagetable 
technology, though: monstrous MAP_SHARED mappings.  For database and 
some VM users, multiple processes will map the same file.  If there was 
a way to ensure appropriate alignment (or at least encourage it) and a 
way to handle mappings that don't cover the whole file, then having 
multiple mappings share the same page tables could be a decent 
efficiently gain.  This doesn't even need COW -- it's "just" pagetable 
sharing.

It's probably a pipe dream, but I like to imagine that the bookkeeping 
that would enable this would also enable a much less ad-hoc concept of 
who owns which pagetable page.  Then things like x86's KPTI LDT mappings 
would be less disgusting under the hood.

Android would probably like a similar feature for MAP_ANONYMOUS or that 
could otherwise enable Zygote to share paging structures (ideally 
without fork(), although that's my dream, not necessarily Android's). 
This is more complex, since COW is involved.  Also possibly less 
valuable -- possibly the entire benefit and then some would be achieved 
by using huge pages for Zygote and arranging for CoWing one normal-size 
page out of a hugepage COW mapping to only COW the one page.

--Andy

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ