lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <55775749.3090004@intel.com>
Date:	Tue, 09 Jun 2015 14:14:49 -0700
From:	Dave Hansen <dave.hansen@...el.com>
To:	Ingo Molnar <mingo@...nel.org>, Mel Gorman <mgorman@...e.de>
CC:	Andrew Morton <akpm@...ux-foundation.org>,
	Rik van Riel <riel@...hat.com>,
	Hugh Dickins <hughd@...gle.com>,
	Minchan Kim <minchan@...nel.org>,
	Andi Kleen <andi@...stfloor.org>,
	H Peter Anvin <hpa@...or.com>, Linux-MM <linux-mm@...ck.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH 0/3] TLB flush multiple pages per IPI v5

I did some of what I talked about earlier in the thread.

I think the sfence (via mb()) is potentially unfair since it removes
some of the CPU's ability to optimize things.  For this kind of test,
any ability that the CPU has to smear the overhead around is a bonus in
practice and should be taken in to account for these tests.

Here's the horribly hacked-together patch so you can see precisely
what's going on:

	https://www.sr71.net/~dave/intel/measure-tlb-stuff.patch

Here's a Haswell Xeon:

> [    0.222090] x86/fpu:########  MM instructions:            ############################
> [    0.222168] x86/fpu: Cost of: __flush_tlb()               fn            :   124 cycles avg:   125
> [    0.222623] x86/fpu: Cost of: __flush_tlb_global()        fn            :   960 cycles avg:   968
> [    0.222744] x86/fpu: Cost of: __flush_tlb_single()        fn            :   216 cycles avg:   216
> [    0.222864] x86/fpu: Cost of: __flush_tlb_single() vmal   fn            :   216 cycles avg:   219
> [    0.222987] x86/fpu: Cost of: __flush_tlb_one() OLD       fn            :   216 cycles avg:   216
> [    0.223139] x86/fpu: Cost of: __flush_tlb_range()         fn            :   284 cycles avg:   287
> [    0.223272] x86/fpu: Cost of: tlb miss                    fn            :     0 cycles avg:     0

And a Westmere Xeon:

> [    1.057770] x86/fpu:########  MM instructions:            ############################
> [    1.065876] x86/fpu: Cost of: __flush_tlb()               fn            :   108 cycles avg:   109
> [    1.075188] x86/fpu: Cost of: __flush_tlb_global()        fn            :   828 cycles avg:   829
> [    1.084162] x86/fpu: Cost of: __flush_tlb_single()        fn            :   232 cycles avg:   237
> [    1.093175] x86/fpu: Cost of: __flush_tlb_single() vmal   fn            :   240 cycles avg:   240
> [    1.102214] x86/fpu: Cost of: __flush_tlb_one() OLD       fn            :   284 cycles avg:   286
> [    1.111299] x86/fpu: Cost of: __flush_tlb_range()         fn            :   472 cycles avg:   478
> [    1.120281] x86/fpu: Cost of: tlb miss                    fn            :     0 cycles avg:     0

I was rather surprised how close the three __flush_tlb_single/one()
variants were on Haswell.  I've looked at a few other CPUs and this was
the only one that acted like this.

The 0 cycle TLB miss was also interesting.  It goes back up to something
reasonable if I put the mb()/mfence's back.

I don't think this kind of thing is a realistic test unless we put
mfence's around all of our TLB flushes in practice. :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ