linux-kernel - Re: [PATCH 0/4] Fix ebizzy performance regression due to X86 TLB range flush v2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20131216102439.GA21624@gmail.com>
Date:	Mon, 16 Dec 2013 11:24:39 +0100
From:	Ingo Molnar <mingo@...nel.org>
To:	Mel Gorman <mgorman@...e.de>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Alex Shi <alex.shi@...aro.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Fengguang Wu <fengguang.wu@...el.com>,
	H Peter Anvin <hpa@...or.com>, Linux-X86 <x86@...nel.org>,
	Linux-MM <linux-mm@...ck.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [PATCH 0/4] Fix ebizzy performance regression due to X86 TLB
 range flush v2


* Mel Gorman <mgorman@...e.de> wrote:

> I had hacked ebizzy to report on the performance of each thread, not 
> just the overall result and worked out the difference in performance 
> of each thread. In a complete fair test you would expect the 
> performance of each thread to be identical and so the spread would 
> be 0
> 
> ebizzy thread spread
>                     3.13.0-rc3            3.13.0-rc3                3.4.69
>                        vanilla           nowalk-v2r7               vanilla
> Mean   1        0.00 (  0.00%)        0.00 (  0.00%)        0.00 (  0.00%)
> Mean   2        0.34 (  0.00%)        0.30 (-11.76%)        0.07 (-79.41%)
> Mean   3        1.29 (  0.00%)        0.92 (-28.68%)        0.29 (-77.52%)
> Mean   4        7.08 (  0.00%)       42.38 (498.59%)        0.22 (-96.89%)
> Mean   5      193.54 (  0.00%)      483.41 (149.77%)        0.41 (-99.79%)
> Mean   6      151.12 (  0.00%)      198.22 ( 31.17%)        0.42 (-99.72%)
> Mean   7      115.38 (  0.00%)      160.29 ( 38.92%)        0.58 (-99.50%)
> Mean   8      108.65 (  0.00%)      138.96 ( 27.90%)        0.44 (-99.60%)
> Range  1        0.00 (  0.00%)        0.00 (  0.00%)        0.00 (  0.00%)
> Range  2        5.00 (  0.00%)        6.00 ( 20.00%)        2.00 (-60.00%)
> Range  3       10.00 (  0.00%)       17.00 ( 70.00%)        9.00 (-10.00%)
> Range  4      256.00 (  0.00%)     1001.00 (291.02%)        5.00 (-98.05%)
> Range  5      456.00 (  0.00%)     1226.00 (168.86%)        6.00 (-98.68%)
> Range  6      298.00 (  0.00%)      294.00 ( -1.34%)        8.00 (-97.32%)
> Range  7      192.00 (  0.00%)      220.00 ( 14.58%)        7.00 (-96.35%)
> Range  8      171.00 (  0.00%)      163.00 ( -4.68%)        8.00 (-95.32%)
> Stddev 1        0.00 (  0.00%)        0.00 (  0.00%)        0.00 (  0.00%)
> Stddev 2        0.72 (  0.00%)        0.85 (-17.99%)        0.29 ( 59.72%)
> Stddev 3        1.42 (  0.00%)        1.90 (-34.22%)        1.12 ( 21.19%)
> Stddev 4       33.83 (  0.00%)      127.26 (-276.15%)        0.79 ( 97.65%)
> Stddev 5       92.08 (  0.00%)      225.01 (-144.35%)        1.06 ( 98.85%)
> Stddev 6       64.82 (  0.00%)       69.43 ( -7.11%)        1.28 ( 98.02%)
> Stddev 7       36.66 (  0.00%)       49.19 (-34.20%)        1.18 ( 96.79%)
> Stddev 8       30.79 (  0.00%)       36.23 (-17.64%)        1.06 ( 96.55%)
> 
> For example, this is saying that with 8 threads on 3.13-rc3 that the 
> difference between the slowest and fastest thread was 171 
> records/second.

We aren't blind fairness fetishists, but the noise difference between 
v3.4 and v3.13 appears to be staggering, it's a serious anomaly in 
itself.

Whatever we did right in v3.4 we want to do in v3.13 as well - or at 
least understand it.

I agree that the absolute numbers would probably only be interesting 
once v3.13 is fixed to not spread thread performance that wildly 
again.

> [...] Because of this bug, I'd be wary about drawing too many 
> conclusions about ebizzy performance when the number of threads 
> exceed the number of CPUs.

Yes.

Could it be that the v3.13 workload context switches a lot more than 
v3.4 workload? That would magnify any TLB range flushing costs and 
would make it essentially a secondary symptom, not a primary cause of 
the regression. (I'm only guessing blindly here though.)

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/