linux-kernel - Re: [PATCH v1 06/11] mm: support GUP-triggered unsharing via FAULT_FLAG

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <D54E059F-9757-46DB-919C-A31A067276CB@vmware.com>
Date:   Sat, 18 Dec 2021 04:52:13 +0000
From:   Nadav Amit <namit@...are.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
CC:     David Hildenbrand <david@...hat.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Hugh Dickins <hughd@...gle.com>,
        David Rientjes <rientjes@...gle.com>,
        Shakeel Butt <shakeelb@...gle.com>,
        John Hubbard <jhubbard@...dia.com>,
        Jason Gunthorpe <jgg@...dia.com>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Mike Rapoport <rppt@...ux.ibm.com>,
        Yang Shi <shy828301@...il.com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Matthew Wilcox <willy@...radead.org>,
        Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
        Michal Hocko <mhocko@...nel.org>,
        Rik van Riel <riel@...riel.com>,
        Roman Gushchin <guro@...com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Peter Xu <peterx@...hat.com>,
        Donald Dutile <ddutile@...hat.com>,
        Christoph Hellwig <hch@....de>,
        Oleg Nesterov <oleg@...hat.com>, Jan Kara <jack@...e.cz>,
        Linux-MM <linux-mm@...ck.org>,
        "open list:KERNEL SELFTEST FRAMEWORK" 
        <linux-kselftest@...r.kernel.org>,
        "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>
Subject: Re: [PATCH v1 06/11] mm: support GUP-triggered unsharing via
 FAULT_FLAG_UNSHARE (!hugetlb)

> On Dec 17, 2021, at 8:02 PM, Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> 
> On Fri, Dec 17, 2021 at 3:53 PM Nadav Amit <namit@...are.com> wrote:
>> 
>> I understand the discussion mainly revolves correctness, which is
>> obviously the most important property, but I would like to mention
>> that having transient get_page() calls causing unnecessary COWs can
>> cause hard-to-analyze and hard-to-avoid performance degradation.
> 
> Note that the COW itself is pretty cheap. Yes, there's the page
> allocation and copy, but it's mostly a local thing.

I don’t know about the page-lock overhead, but I understand your argument.

Having said that, I do know a bit about TLB flushes, which you did not
mention as overheads of COW. Such flushes can be quite expensive on
multithreaded workloads (specifically on VMs, but lets put those aside).

Take for instance memcached and assume you overcommit memory with a very fast
swap (e.g., pmem, zram, perhaps even slower). Now, it turns out memcached
often accesses a page first for read and shortly after for write. I
encountered, in a similar scenario, that the page reference that
lru_cache_add() takes during the first faultin event (for read), causes a COW
on a write page-fault that happens shortly after [1]. So on memcached I
assume this would also trigger frequent unnecessary COWs.

Besides page allocation and copy, COW would then require a TLB flush, which,
when performed locally, might not be too bad (~200 cycles). But if memcached
has many threads, as it usually does, then you need a TLB shootdown and this
one can be expensive (microseconds). If you start getting a TLB shootdown
storm, you may avoid some IPIs since you see that other CPUs already queued
IPIs for the target CPU. But then the kernel would flush the entire TLB on
the the target CPU, as it realizes that multiple TLB flushes were queued,
and as it assumes that a full TLB flush would be cheaper.

[ I can try to run a benchmark during the weekend to measure the impact, as I
  did not really measure the impact on memcached before/after 5.8. ]

So I am in no position to prioritize one overhead over the other, but I do
not think that COW can be characterized as mostly-local and cheap in the
case of multithreaded workloads.

[1] https://lore.kernel.org/linux-mm/0480D692-D9B2-429A-9A88-9BBA1331AC3A@gmail.com/