lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 24 Aug 2020 10:29:16 +0200
From:   Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To:     linux-kernel@...r.kernel.org
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        stable@...r.kernel.org,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Xu Yu <xuyu@...ux.alibaba.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will.deacon@....com>,
        Yang Shi <shy828301@...il.com>
Subject: [PATCH 5.7 022/124] mm/memory.c: skip spurious TLB flush for retried page fault

From: Yang Shi <shy828301@...il.com>

commit b7333b58f358f38d90d78e00c1ee5dec82df10ad upstream.

Recently we found regression when running will_it_scale/page_fault3 test
on ARM64.  Over 70% down for the multi processes cases and over 20% down
for the multi threads cases.  It turns out the regression is caused by
commit 89b15332af7c ("mm: drop mmap_sem before calling
balance_dirty_pages() in write fault").

The test mmaps a memory size file then write to the mapping, this would
make all memory dirty and trigger dirty pages throttle, that upstream
commit would release mmap_sem then retry the page fault.  The retried
page fault would see correct PTEs installed then just fall through to
spurious TLB flush.  The regression is caused by the excessive spurious
TLB flush.  It is fine on x86 since x86's spurious TLB flush is no-op.

We could just skip the spurious TLB flush to mitigate the regression.

Suggested-by: Linus Torvalds <torvalds@...ux-foundation.org>
Reported-by: Xu Yu <xuyu@...ux.alibaba.com>
Debugged-by: Xu Yu <xuyu@...ux.alibaba.com>
Tested-by: Xu Yu <xuyu@...ux.alibaba.com>
Cc: Johannes Weiner <hannes@...xchg.org>
Cc: Catalin Marinas <catalin.marinas@....com>
Cc: Will Deacon <will.deacon@....com>
Cc: <stable@...r.kernel.org>
Signed-off-by: Yang Shi <shy828301@...il.com>
Signed-off-by: Linus Torvalds <torvalds@...ux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@...uxfoundation.org>

---
 mm/memory.c |    3 +++
 1 file changed, 3 insertions(+)

--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4237,6 +4237,9 @@ static vm_fault_t handle_pte_fault(struc
 				vmf->flags & FAULT_FLAG_WRITE)) {
 		update_mmu_cache(vmf->vma, vmf->address, vmf->pte);
 	} else {
+		/* Skip spurious TLB flush for retried page fault */
+		if (vmf->flags & FAULT_FLAG_TRIED)
+			goto unlock;
 		/*
 		 * This is needed only for protection faults but the arch code
 		 * is not yet telling us if this is a protection fault or not.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ