lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 16 Sep 2019 17:55:27 -0300
From:   Leonardo Bras <leonardo@...ux.ibm.com>
To:     linuxppc-dev@...ts.ozlabs.org, linux-kernel@...r.kernel.org
Cc:     Leonardo Bras <leonardo@...ux.ibm.com>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Paul Mackerras <paulus@...ba.org>,
        Michael Ellerman <mpe@...erman.id.au>,
        "Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>,
        Christophe Leroy <christophe.leroy@....fr>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mike Rapoport <rppt@...ux.ibm.com>,
        Nicholas Piggin <npiggin@...il.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Jason Gunthorpe <jgg@...pe.ca>,
        Vlastimil Babka <vbabka@...e.cz>,
        Dan Williams <dan.j.williams@...el.com>,
        Christoph Lameter <cl@...ux.com>,
        "Tobin C. Harding" <tobin@...nel.org>,
        Jann Horn <jannh@...gle.com>,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        Souptick Joarder <jrdr.linux@...il.com>,
        Ralph Campbell <rcampbell@...dia.com>,
        Andrey Ryabinin <aryabinin@...tuozzo.com>,
        Davidlohr Bueso <dave@...olabs.net>
Subject: [PATCH 1/1] powerpc: mm: Check if serialize_against_pte_lookup() really needs to run

If a process (qemu) with a lot of CPUs (128) try to munmap() a large chunk
of memory (496GB) mapped with THP, it takes an average of 275 seconds,
which can cause a lot of problems to the load (in qemu case, the guest
will lock for this time).

Trying to find the source of this bug, I found out most of this time is
spent on serialize_against_pte_lookup(). This function will take a lot of
time in smp_call_function_many() if there is more than a couple CPUs
running the user process. Since it has to happen to all THP mapped, it will
take a very long time for large amounts of memory.

By the docs, serialize_against_pte_lookup() is needed in order to avoid
pmd_t to pte_t casting inside find_current_mm_pte() to happen concurrently
with the next part of the functions it's called in.
It does so by calling a do_nothing() on each CPU in mm->cpu_bitmap[];

So, by what I could understand, if there is no find_current_mm_pte()
running, there is no need to call serialize_against_pte_lookup().

So, to avoid the cost of running serialize_against_pte_lookup(), I propose
a counter that keeps track of how many find_current_mm_pte() are currently
running, and if there is none, just skip smp_call_function_many().

On my workload (qemu), I could see munmap's time reduction from 275 seconds
to 418ms.

Signed-off-by: Leonardo Bras <leonardo@...ux.ibm.com>

---
I need more experienced people's help in order to understand if this is
really a valid improvement, and if mm_struct is the best place to put such
counter.

Thanks!
---
 arch/powerpc/include/asm/pte-walk.h | 3 +++
 arch/powerpc/mm/book3s64/pgtable.c  | 2 ++
 include/linux/mm_types.h            | 1 +
 3 files changed, 6 insertions(+)

diff --git a/arch/powerpc/include/asm/pte-walk.h b/arch/powerpc/include/asm/pte-walk.h
index 33fa5dd8ee6a..3b82cb3bd563 100644
--- a/arch/powerpc/include/asm/pte-walk.h
+++ b/arch/powerpc/include/asm/pte-walk.h
@@ -40,6 +40,8 @@ static inline pte_t *find_current_mm_pte(pgd_t *pgdir, unsigned long ea,
 {
 	pte_t *pte;
 
+	atomic64_inc(&current->mm->lockless_ptlookup_count);
+
 	VM_WARN(!arch_irqs_disabled(), "%s called with irq enabled\n", __func__);
 	VM_WARN(pgdir != current->mm->pgd,
 		"%s lock less page table lookup called on wrong mm\n", __func__);
@@ -53,6 +55,7 @@ static inline pte_t *find_current_mm_pte(pgd_t *pgdir, unsigned long ea,
 	if (hshift)
 		WARN_ON(*hshift);
 #endif
+	atomic64_dec(&current->mm->lockless_ptlookup_count);
 	return pte;
 }
 
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index 7d0e0d0d22c4..8f6fc2f80071 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -95,6 +95,8 @@ static void do_nothing(void *unused)
 void serialize_against_pte_lookup(struct mm_struct *mm)
 {
 	smp_mb();
+	if (atomic64_read(&mm->lockless_ptlookup_count) == 0)
+		return;
 	smp_call_function_many(mm_cpumask(mm), do_nothing, NULL, 1);
 }
 
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 6a7a1083b6fb..97fb2545e967 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -518,6 +518,7 @@ struct mm_struct {
 #endif
 	} __randomize_layout;
 
+	atomic64_t lockless_ptlookup_count;
 	/*
 	 * The mm_cpumask needs to be at the end of mm_struct, because it
 	 * is dynamically sized based on nr_cpu_ids.
-- 
2.20.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ