[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c931c9dc-c0ed-05f3-7364-a06088ca7754@redhat.com>
Date: Sat, 21 May 2022 22:28:59 +0200
From: David Hildenbrand <david@...hat.com>
To: Chih-En Lin <shiyn.lin@...il.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Christian Brauner <brauner@...nel.org>,
"Matthew Wilcox (Oracle)" <willy@...radead.org>,
Vlastimil Babka <vbabka@...e.cz>,
William Kucharski <william.kucharski@...cle.com>,
John Hubbard <jhubbard@...dia.com>,
Yunsheng Lin <linyunsheng@...wei.com>,
Arnd Bergmann <arnd@...db.de>,
Suren Baghdasaryan <surenb@...gle.com>,
Colin Cross <ccross@...gle.com>,
Feng Tang <feng.tang@...el.com>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Mike Rapoport <rppt@...nel.org>,
Geert Uytterhoeven <geert@...ux-m68k.org>,
Anshuman Khandual <anshuman.khandual@....com>,
"Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>,
Daniel Axtens <dja@...ens.net>,
Jonathan Marek <jonathan@...ek.ca>,
Christophe Leroy <christophe.leroy@...roup.eu>,
Pasha Tatashin <pasha.tatashin@...een.com>,
Peter Xu <peterx@...hat.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
Andy Lutomirski <luto@...nel.org>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Fenghua Yu <fenghua.yu@...el.com>,
linux-kernel@...r.kernel.org, Kaiyang Zhao <zhao776@...due.edu>,
Huichun Feng <foxhoundsk.tw@...il.com>,
Jim Huang <jserv.tw@...il.com>
Subject: Re: [RFC PATCH 0/6] Introduce Copy-On-Write to Page Table
On 21.05.22 20:50, Chih-En Lin wrote:
> On Sat, May 21, 2022 at 06:07:27PM +0200, David Hildenbrand wrote:
>> On 19.05.22 20:31, Chih-En Lin wrote:
>>> When creating the user process, it usually uses the Copy-On-Write (COW)
>>> mechanism to save the memory usage and the cost of time for copying.
>>> COW defers the work of copying private memory and shares it across the
>>> processes as read-only. If either process wants to write in these
>>> memories, it will page fault and copy the shared memory, so the process
>>> will now get its private memory right here, which is called break COW.
>>
>> Yes. Lately we've been dealing with advanced COW+GUP pinnings (which
>> resulted in PageAnonExclusive, which should hit upstream soon), and
>> hearing about COW of page tables (and wondering how it will interact
>> with the mapcount, refcount, PageAnonExclusive of anonymous pages) makes
>> me feel a bit uneasy :)
>
> I saw the series patch of this and knew how complicated handling COW of
> the physical page was [1][2][3][4]. So the COW page table will tend to
> restrict the sharing only to the page table. This means any modification
> to the physical page will trigger the break COW of page table.
>
> Presently implementation will only update the physical page information
> to the RSS of the owner process of COW PTE. Generally owner is the
> parent process. And the state of the page, like refcount and mapcount,
> will not change under the COW page table.
>
> But if any situations will lead to the COW page table needs to consider
> the state of physical page, it might be fretful. ;-)
I haven't looked into the details of how GUP deals with these COW page
tables. But I suspect there might be problems with page pinning:
skipping copy_present_page() even for R/O pages is usually problematic
with R/O pinnings of pages. I might be just wrong.
>
>>>
>>> Presently this kind of technology is only used as the mapping memory.
>>> It still needs to copy the entire page table from the parent.
>>> It might cost a lot of time and memory to copy each page table when the
>>> parent already has a lot of page tables allocated. For example, here is
>>> the state table for mapping the 1 GB memory of forking.
>>>
>>> mmap before fork mmap after fork
>>> MemTotal: 32746776 kB 32746776 kB
>>> MemFree: 31468152 kB 31463244 kB
>>> AnonPages: 1073836 kB 1073628 kB
>>> Mapped: 39520 kB 39992 kB
>>> PageTables: 3356 kB 5432 kB
>>
>>
>> I'm missing the most important point: why do we care and why should we
>> care to make our COW/fork implementation even more complicated?
>>
>> Yes, we might save some page tables and we might reduce the fork() time,
>> however, which specific workload really benefits from this and why do we
>> really care about that workload? Without even hearing about an example
>> user in this cover letter (unless I missed it), I naturally wonder about
>> relevance in practice.
>>
>> I assume it really only matters if we fork() realtively large processes,
>> like databases for snapshotting. However, fork() is already a pretty
>> sever performance hit due to COW, and there are alternatives getting
>> developed as a replacement for such use cases (e.g., uffd-wp).
>>
>> I'm also missing a performance evaluation: I'd expect some simple
>> workloads that use fork() might be even slower after fork() with this
>> change.
>>
>
> The paper mentioned a list of benchmarks of the time cost for On-Demand
> fork. For example, on Redis, the meantime of fork when taking the
> snapshot. Default fork() got 7.40 ms; On-demand Fork (COW PTE table) got
> 0.12 ms. But there are some other cases, like the Response latency
> distribution of Apache HTTP Server, are not have significant benefits
> from their On-demand fork.
Thanks. I expected that snapshotting would pop up and be one of the most
prominent users that could benefit. However, for that specific use case
I am convinced that uffd-wp is the better choice and fork() is just the
old way of doing it. having nothing better at hand. QEMU already
implements snapshotting of VMs that way and I remember that redis also
intended to implement support for uffd-wp. Not sure what happened with
that and if there is anything missing to make it work.
>
> For the COW page table from this patch, I also take the perf to analyze
> the cost time. But it looks like not different from the default fork.
Interesting, thanks for sharing.
>
> Here is the report, the mmap-sfork is COW page table version:
>
> Performance counter stats for './mmap-fork' (100 runs):
>
> 373.92 msec task-clock # 0.992 CPUs utilized ( +- 0.09% )
> 1 context-switches # 2.656 /sec ( +- 6.03% )
> 0 cpu-migrations # 0.000 /sec
> 881 page-faults # 2.340 K/sec ( +- 0.02% )
> 1,860,460,792 cycles # 4.941 GHz ( +- 0.08% )
> 1,451,024,912 instructions # 0.78 insn per cycle ( +- 0.00% )
> 310,129,843 branches # 823.559 M/sec ( +- 0.01% )
> 1,552,469 branch-misses # 0.50% of all branches ( +- 0.38% )
>
> 0.377007 +- 0.000480 seconds time elapsed ( +- 0.13% )
>
> Performance counter stats for './mmap-sfork' (100 runs):
>
> 373.04 msec task-clock # 0.992 CPUs utilized ( +- 0.10% )
> 1 context-switches # 2.660 /sec ( +- 6.58% )
> 0 cpu-migrations # 0.000 /sec
> 877 page-faults # 2.333 K/sec ( +- 0.08% )
> 1,851,843,683 cycles # 4.926 GHz ( +- 0.08% )
> 1,451,763,414 instructions # 0.78 insn per cycle ( +- 0.00% )
> 310,270,268 branches # 825.352 M/sec ( +- 0.01% )
> 1,649,486 branch-misses # 0.53% of all branches ( +- 0.49% )
>
> 0.376095 +- 0.000478 seconds time elapsed ( +- 0.13% )
>
> So, the COW of the page table may reduce the time of forking. But it
> builds on the transfer of the copy work to other modified operations
> to the physical page.
Right.
>
>> I have tons of questions regarding rmap, accounting, GUP, page table
>> walkers, OOM situations in page walkers, but at this point I am not
>> (yet) convinced that the added complexity is really worth it. So I'd
>> appreciate some additional information.
>
> It seems like I have a lot of work to do. ;-)
Messing with page tables and COW is usually like opening a can of worms :)
--
Thanks,
David / dhildenb
Powered by blists - more mailing lists