lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Sat, 25 May 2019 01:21:57 -0700 From: Nadav Amit <namit@...are.com> To: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Andy Lutomirski <luto@...nel.org> Cc: Borislav Petkov <bp@...en8.de>, linux-kernel@...r.kernel.org, Nadav Amit <namit@...are.com> Subject: [RFC PATCH 0/6] x86/mm: Flush remote and local TLBs concurrently Currently, local and remote TLB flushes are not performed concurrently, which introduces unnecessary overhead - each INVLPG can take 100s of cycles. This patch-set allows TLB flushes to be run concurrently: first request the remote CPUs to initiate the flush, then run it locally, and finally wait for the remote CPUs to finish their work. The proposed changes should also improve the performance of other invocations of on_each_cpu(). Hopefully, no one has relied on the behavior of on_each_cpu() that functions were first executed remotely and only then locally. On my Haswell machine (bare-metal), running a TLB flush microbenchmark (MADV_DONTNEED/touch for a single page on one thread), takes the following time (ns): n_threads before after --------- ------ ----- 1 661 663 2 1436 1225 (-14%) 4 1571 1421 (-10%) Note that since the benchmark also causes page-faults, the actual speedup of TLB shootdowns is actually greater. Also note the higher improvement in performance with 2 thread (a single remote TLB flush target). This seems to be a side-effect of holding synchronization data-structures (csd) off the stack, unlike the way it is currently done (in smp_call_function_single()). Patches 1-2 do small cleanup. Patches 3-5 actually implement the concurrent execution of TLB flushes. Patch 6 restores local TLB flushes performance, which was hurt by the optimization, to be as good as it was before these changes by introducing a fast-pass for this specific case. Nadav Amit (6): smp: Remove smp_call_function() and on_each_cpu() return values cpumask: Purify cpumask_next() smp: Run functions concurrently in smp_call_function_many() x86/mm/tlb: Refactor common code into flush_tlb_on_cpus() x86/mm/tlb: Flush remote and local TLBs concurrently x86/mm/tlb: Optimize local TLB flushes arch/alpha/kernel/smp.c | 19 +--- arch/alpha/oprofile/common.c | 6 +- arch/ia64/kernel/perfmon.c | 12 +-- arch/ia64/kernel/uncached.c | 8 +- arch/x86/hyperv/mmu.c | 2 + arch/x86/include/asm/paravirt.h | 8 ++ arch/x86/include/asm/paravirt_types.h | 6 ++ arch/x86/include/asm/tlbflush.h | 6 ++ arch/x86/kernel/kvm.c | 1 + arch/x86/kernel/paravirt.c | 3 + arch/x86/lib/cache-smp.c | 3 +- arch/x86/mm/tlb.c | 137 +++++++++++++++++-------- arch/x86/xen/mmu_pv.c | 2 + drivers/char/agp/generic.c | 3 +- include/linux/cpumask.h | 2 +- include/linux/smp.h | 32 ++++-- kernel/smp.c | 139 ++++++++++++-------------- kernel/up.c | 3 +- 18 files changed, 230 insertions(+), 162 deletions(-) -- 2.20.1
Powered by blists - more mailing lists