lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 3 Jul 2019 16:04:21 +0200
From:   Juergen Gross <jgross@...e.com>
To:     Nadav Amit <namit@...are.com>, Andy Lutomirski <luto@...nel.org>,
        Dave Hansen <dave.hansen@...ux.intel.com>
Cc:     Borislav Petkov <bp@...en8.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Sasha Levin <sashal@...nel.org>, x86@...nel.org,
        Thomas Gleixner <tglx@...utronix.de>,
        virtualization@...ts.linux-foundation.org,
        xen-devel@...ts.xenproject.org,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        "K. Y. Srinivasan" <kys@...rosoft.com>,
        Stephen Hemminger <sthemmin@...rosoft.com>,
        Boris Ostrovsky <boris.ostrovsky@...cle.com>,
        Ingo Molnar <mingo@...hat.com>,
        Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org,
        linux-hyperv@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 4/9] x86/mm/tlb: Flush remote and local TLBs
 concurrently

On 03.07.19 01:51, Nadav Amit wrote:
> To improve TLB shootdown performance, flush the remote and local TLBs
> concurrently. Introduce flush_tlb_multi() that does so. Introduce
> paravirtual versions of flush_tlb_multi() for KVM, Xen and hyper-v (Xen
> and hyper-v are only compile-tested).
> 
> While the updated smp infrastructure is capable of running a function on
> a single local core, it is not optimized for this case. The multiple
> function calls and the indirect branch introduce some overhead, and
> might make local TLB flushes slower than they were before the recent
> changes.
> 
> Before calling the SMP infrastructure, check if only a local TLB flush
> is needed to restore the lost performance in this common case. This
> requires to check mm_cpumask() one more time, but unless this mask is
> updated very frequently, this should impact performance negatively.
> 
> Cc: "K. Y. Srinivasan" <kys@...rosoft.com>
> Cc: Haiyang Zhang <haiyangz@...rosoft.com>
> Cc: Stephen Hemminger <sthemmin@...rosoft.com>
> Cc: Sasha Levin <sashal@...nel.org>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Cc: Ingo Molnar <mingo@...hat.com>
> Cc: Borislav Petkov <bp@...en8.de>
> Cc: x86@...nel.org
> Cc: Juergen Gross <jgross@...e.com>
> Cc: Paolo Bonzini <pbonzini@...hat.com>
> Cc: Dave Hansen <dave.hansen@...ux.intel.com>
> Cc: Andy Lutomirski <luto@...nel.org>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Boris Ostrovsky <boris.ostrovsky@...cle.com>
> Cc: linux-hyperv@...r.kernel.org
> Cc: linux-kernel@...r.kernel.org
> Cc: virtualization@...ts.linux-foundation.org
> Cc: kvm@...r.kernel.org
> Cc: xen-devel@...ts.xenproject.org
> Signed-off-by: Nadav Amit <namit@...are.com>
> ---
>   arch/x86/hyperv/mmu.c                 | 13 +++---
>   arch/x86/include/asm/paravirt.h       |  6 +--
>   arch/x86/include/asm/paravirt_types.h |  4 +-
>   arch/x86/include/asm/tlbflush.h       |  9 ++--
>   arch/x86/include/asm/trace/hyperv.h   |  2 +-
>   arch/x86/kernel/kvm.c                 | 11 +++--
>   arch/x86/kernel/paravirt.c            |  2 +-
>   arch/x86/mm/tlb.c                     | 65 ++++++++++++++++++++-------
>   arch/x86/xen/mmu_pv.c                 | 20 ++++++---
>   include/trace/events/xen.h            |  2 +-
>   10 files changed, 91 insertions(+), 43 deletions(-)

...

> diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
> index beb44e22afdf..19e481e6e904 100644
> --- a/arch/x86/xen/mmu_pv.c
> +++ b/arch/x86/xen/mmu_pv.c
> @@ -1355,8 +1355,8 @@ static void xen_flush_tlb_one_user(unsigned long addr)
>   	preempt_enable();
>   }
>   
> -static void xen_flush_tlb_others(const struct cpumask *cpus,
> -				 const struct flush_tlb_info *info)
> +static void xen_flush_tlb_multi(const struct cpumask *cpus,
> +				const struct flush_tlb_info *info)
>   {
>   	struct {
>   		struct mmuext_op op;
> @@ -1366,7 +1366,7 @@ static void xen_flush_tlb_others(const struct cpumask *cpus,
>   	const size_t mc_entry_size = sizeof(args->op) +
>   		sizeof(args->mask[0]) * BITS_TO_LONGS(num_possible_cpus());
>   
> -	trace_xen_mmu_flush_tlb_others(cpus, info->mm, info->start, info->end);
> +	trace_xen_mmu_flush_tlb_multi(cpus, info->mm, info->start, info->end);
>   
>   	if (cpumask_empty(cpus))
>   		return;		/* nothing to do */
> @@ -1375,9 +1375,17 @@ static void xen_flush_tlb_others(const struct cpumask *cpus,
>   	args = mcs.args;
>   	args->op.arg2.vcpumask = to_cpumask(args->mask);
>   
> -	/* Remove us, and any offline CPUS. */
> +	/* Flush locally if needed and remove us */
> +	if (cpumask_test_cpu(smp_processor_id(), to_cpumask(args->mask))) {
> +		local_irq_disable();
> +		flush_tlb_func_local(info);

I think this isn't the correct function for PV guests.

In fact it should be much easier: just don't clear the own cpu from the
mask, that's all what's needed. The hypervisor is just fine having the
current cpu in the mask and it will do the right thing.


Juergen

Powered by blists - more mailing lists