lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200710093556.GY4800@hirez.programming.kicks-ass.net>
Date:   Fri, 10 Jul 2020 11:35:56 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Nicholas Piggin <npiggin@...il.com>
Cc:     linux-arch@...r.kernel.org, x86@...nel.org,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Arnd Bergmann <arnd@...db.de>, linux-kernel@...r.kernel.org,
        linuxppc-dev@...ts.ozlabs.org, linux-mm@...ck.org,
        Anton Blanchard <anton@...abs.org>
Subject: Re: [RFC PATCH 7/7] lazy tlb: shoot lazies, a non-refcounting lazy
 tlb option

On Fri, Jul 10, 2020 at 11:56:46AM +1000, Nicholas Piggin wrote:
> On big systems, the mm refcount can become highly contented when doing
> a lot of context switching with threaded applications (particularly
> switching between the idle thread and an application thread).
> 
> Abandoning lazy tlb slows switching down quite a bit in the important
> user->idle->user cases, so so instead implement a non-refcounted scheme
> that causes __mmdrop() to IPI all CPUs in the mm_cpumask and shoot down
> any remaining lazy ones.
> 
> On a 16-socket 192-core POWER8 system, a context switching benchmark
> with as many software threads as CPUs (so each switch will go in and
> out of idle), upstream can achieve a rate of about 1 million context
> switches per second. After this patch it goes up to 118 million.

That's mighty impressive, however:

> +static void shoot_lazy_tlbs(struct mm_struct *mm)
> +{
> +	if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_SHOOTDOWN)) {
> +		smp_call_function_many(mm_cpumask(mm), do_shoot_lazy_tlb, (void *)mm, 1);
> +		do_shoot_lazy_tlb(mm);
> +	}
> +}

IIRC you (power) never clear a CPU from that mask, so for other
workloads I can see this resulting in massive amounts of IPIs.

For instance, take as many processes as you have CPUs. For each,
manually walk the task across all CPUs and exit. Again.

Clearly, that's an extreme, but still...


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ