[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220718120212.3180-1-namit@vmware.com>
Date: Mon, 18 Jul 2022 05:01:58 -0700
From: Nadav Amit <nadav.amit@...il.com>
To: linux-mm@...ck.org
Cc: linux-kernel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>,
Mike Rapoport <rppt@...ux.ibm.com>,
Axel Rasmussen <axelrasmussen@...gle.com>,
Nadav Amit <namit@...are.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Andrew Cooper <andrew.cooper3@...rix.com>,
Andy Lutomirski <luto@...nel.org>,
Dave Hansen <dave.hansen@...ux.intel.com>,
David Hildenbrand <david@...hat.com>,
Peter Xu <peterx@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Will Deacon <will@...nel.org>, Yu Zhao <yuzhao@...gle.com>,
Nick Piggin <npiggin@...il.com>
Subject: [RFC PATCH 00/14] mm: relaxed TLB flushes and other optimi.
From: Nadav Amit <namit@...are.com>
Following the optimizations to avoid unnecessary TLB flushes [1],
mprotect() and userfaultfd() did not cause unnecessary TLB flushes when
protection was unchanged. This enabled userfaultfd to write-unprotect a
page without triggering a TLB flush (and potentially shootdown).
After these changes, David added another feature to mprotect [2],
allowing pages that can safely be mapped as writable, to be mapped as
such directly from mprotect(), instead of going through the page fault
handler. This saves the overhead of a page-fault when write-unprotecting
private exclusive pages as writable, for instance.
This change introduced, however, some undesired behaviors, especially if
we adopt this new feature for userfaultfd. First, the newly mapped PTE
is not set as dirty, which might induce on x86 over 500 cycles of
overhead (if the page was not dirty before). Second, once again we can
have an expensive TLB shootdown when we write-unprotect a page: when we
relax the protection (i.e., give more permission), we would do a TLB
flush. If the application is multithreaded, or a userfaultfd monitor
uses write-unprotect (which is a common case), a TLB shootdown would be
needed.
This patch-set allows userfaultfd to map pages as writeable directly
upon write-(un)protect ioctl, while addressing the undesired behaviors
that occur when one uses userfaultfd write-unprotect or mprotect to add
permissions. It also does some cleanup and micro-optimizations along the
way.
The main change that is done in the patch-set - x86 specific, at the
moment - is the introduction of "relaxed" TLB flushes when permissions
are added. Upon a "relaxed" TLB flush, the mm's TLB generation is
advanced and the local TLB is flushed, but no TLB shootdown takes place.
If a spurious page-fault occurs and the local generation of the TLB is
found to be out-of-sync with the mm generation, a full TLB flush is
performed on the faulting core to prevent further spurious page-faults.
To a certain extent "relaxed flushes" are similar to the changes that
were proposed some time ago for kernel mappings [3]. However, it does
not have any complicated interactions with with NMI handlers.
Experiments on Haswell show the performance improvement. Running, for a
single page, a loop of (1) mprotect(READ); (2) mprotect(READ|WRITE) and
then (3) access provides the following result (on bare metal this time):
mprotect(PROT_READ) time in cycles:
1 Thread 2 Threads
Before (5.19rc4+) 2499 4655
+patch 2495 4363 (-6%)
mprotect(PROT_READ|PROT_WRITE) in cycles:
1 Thread 2 Threads
Before (5.19rc4+) 2529 4675
+patch 2496 2615 (-44%)
If we ran MADV_FREE or the page was not dirty, we can also shorten the
PROT_READ time by skipping the TLB shootdown with this patch-set.
[1] https://lore.kernel.org/all/20220401180821.1986781-1-namit@vmware.com/
[2] https://lore.kernel.org/all/20220614093629.76309-1-david@redhat.com/
[3] https://lore.kernel.org/all/4797D64D.1060105@goop.org/
Cc: Andrea Arcangeli <aarcange@...hat.com>
Cc: Andrew Cooper <andrew.cooper3@...rix.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>
Cc: Andy Lutomirski <luto@...nel.org>
Cc: Dave Hansen <dave.hansen@...ux.intel.com>
Cc: David Hildenbrand <david@...hat.com>
Cc: Peter Xu <peterx@...hat.com>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: Will Deacon <will@...nel.org>
Cc: Yu Zhao <yuzhao@...gle.com>
Cc: Nick Piggin <npiggin@...il.com>
Cc: Axel Rasmussen <axelrasmussen@...gle.com>
Cc: Mike Rapoport <rppt@...ux.ibm.com>
Nadav Amit (14):
userfaultfd: set dirty and young on writeprotect
userfaultfd: try to map write-unprotected pages
mm/mprotect: allow exclusive anon pages to be writable
mm/mprotect: preserve write with MM_CP_TRY_CHANGE_WRITABLE
x86/mm: check exec permissions on fault
mm/rmap: avoid flushing on page_vma_mkclean_one() when possible
mm: do fix spurious page-faults for instruction faults
x86/mm: introduce flush_tlb_fix_spurious_fault
mm: introduce relaxed TLB flushes
x86/mm: introduce relaxed TLB flushes
x86/mm: use relaxed TLB flushes when protection is removed
x86/tlb: no flush on PTE change from RW->RO when PTE is clean
mm/mprotect: do not check flush type if a strict is needed
mm: conditional check of pfn in pte_flush_type
arch/x86/include/asm/pgtable.h | 4 +-
arch/x86/include/asm/tlb.h | 3 +-
arch/x86/include/asm/tlbflush.h | 90 +++++++++++++++++--------
arch/x86/kernel/alternative.c | 2 +-
arch/x86/kernel/ldt.c | 3 +-
arch/x86/mm/fault.c | 22 +++++-
arch/x86/mm/tlb.c | 21 +++++-
include/asm-generic/tlb.h | 116 +++++++++++++++++++-------------
include/linux/mm.h | 2 +
include/linux/mm_types.h | 6 ++
mm/huge_memory.c | 9 ++-
mm/hugetlb.c | 2 +-
mm/memory.c | 2 +-
mm/mmu_gather.c | 1 +
mm/mprotect.c | 31 ++++++---
mm/rmap.c | 16 +++--
mm/userfaultfd.c | 10 ++-
17 files changed, 237 insertions(+), 103 deletions(-)
--
2.25.1
Powered by blists - more mailing lists