linux-kernel - post-3.18 performance regression in TLB flushing code

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <5490A5F8.6050504@sr71.net>
Date:	Tue, 16 Dec 2014 13:36:56 -0800
From:	Dave Hansen <dave@...1.net>
To:	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Russell King - ARM Linux <linux@....linux.org.uk>,
	Michal Simek <monstr@...str.eu>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Will Deacon <will.deacon@....com>,
	LKML <linux-kernel@...r.kernel.org>,
	Linux-MM <linux-mm@...ck.org>
Subject: post-3.18 performance regression in TLB flushing code

I'm running the 'brk1' test from will-it-scale:

> https://github.com/antonblanchard/will-it-scale/blob/master/tests/brk1.c

on a 8-socket/160-thread system.  It's seeing about a 6% drop in
performance (263M -> 247M ops/sec at 80-threads) from this commit:

	commit fb7332a9fedfd62b1ba6530c86f39f0fa38afd49
	Author: Will Deacon <will.deacon@....com>
	Date:   Wed Oct 29 10:03:09 2014 +0000

	 mmu_gather: move minimal range calculations into generic code

tlb_finish_mmu() goes up about 9x in the profiles (~0.4%->3.6%) and
tlb_flush_mmu_free() takes about 3.1% of CPU time with the patch
applied, but does not show up at all on the commit before.

This isn't a major regression, but it is rather unfortunate for a patch
that is apparently a code cleanup.  It also _looks_ to show up even when
things are single-threaded, although I haven't looked at it in detail.

I suspect the tlb->need_flush logic was serving some role that the
modified code isn't capturing like in this hunk:

>  void tlb_flush_mmu(struct mmu_gather *tlb)
>  {
> -       if (!tlb->need_flush)
> -               return;
>         tlb_flush_mmu_tlbonly(tlb);
>         tlb_flush_mmu_free(tlb);
>  }

tlb_flush_mmu_tlbonly() has tlb->end check (which replaces the
->need_flush logic), but tlb_flush_mmu_free() does not.

If we add a !tlb->end (patch attached) to tlb_flush_mmu(), that gets us
back up to ~258M ops/sec, but that's still ~2% down from where we started.

View attachment "fix-old-need_flush-logic.patch" of type "text/x-patch" (462 bytes)