[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20131217143253.GB11295@suse.de>
Date: Tue, 17 Dec 2013 14:32:53 +0000
From: Mel Gorman <mgorman@...e.de>
To: Ingo Molnar <mingo@...nel.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Alex Shi <alex.shi@...aro.org>,
Thomas Gleixner <tglx@...utronix.de>,
Andrew Morton <akpm@...ux-foundation.org>,
Fengguang Wu <fengguang.wu@...el.com>,
H Peter Anvin <hpa@...or.com>, Linux-X86 <x86@...nel.org>,
Linux-MM <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [PATCH 0/4] Fix ebizzy performance regression due to X86 TLB
range flush v2
On Tue, Dec 17, 2013 at 12:00:51PM +0100, Ingo Molnar wrote:
>
> > sched: Assign correct scheduling domain to sd_llc
> >
> > Commit 42eb088e (sched: Avoid NULL dereference on sd_busy) corrected a NULL
> > dereference on sd_busy but the fix also altered what scheduling domain it
> > used for sd_llc. One impact of this is that a task selecting a runqueue may
> > consider idle CPUs that are not cache siblings as candidates for running.
> > Tasks are then running on CPUs that are not cache hot.
> >
> > <PATCH SNIPPED>
>
> Indeed that makes a lot of sense, thanks Mel for tracking down this
> part of the puzzle! Will get your fix to Linus ASAP.
>
> Does this fix also speed up Ebizzy's transaction performance, or is
> its main effect a reduction in workload variation noise?
>
Mixed results, some gains and some losses.
3.13.0-rc3 3.13.0-rc3 3.4.69 3.13.0-rc3
vanilla nowalk-v2r7 vanilla fixsd-v3r3
Mean 1 7295.77 ( 0.00%) 7835.63 ( 7.40%) 6713.32 ( -7.98%) 7757.03 ( 6.32%)
Mean 2 8252.58 ( 0.00%) 9554.63 ( 15.78%) 8334.43 ( 0.99%) 9457.34 ( 14.60%)
Mean 3 8179.74 ( 0.00%) 9032.46 ( 10.42%) 8134.42 ( -0.55%) 8928.25 ( 9.15%)
Mean 4 7862.45 ( 0.00%) 8688.01 ( 10.50%) 7966.27 ( 1.32%) 8560.87 ( 8.88%)
Mean 5 7170.24 ( 0.00%) 8216.15 ( 14.59%) 7820.63 ( 9.07%) 8270.72 ( 15.35%)
Mean 6 6835.10 ( 0.00%) 7866.95 ( 15.10%) 7773.30 ( 13.73%) 7998.50 ( 17.02%)
Mean 7 6740.99 ( 0.00%) 7586.36 ( 12.54%) 7712.45 ( 14.41%) 7519.46 ( 11.55%)
Mean 8 6494.01 ( 0.00%) 6849.82 ( 5.48%) 7705.62 ( 18.66%) 6842.44 ( 5.37%)
Mean 12 6567.37 ( 0.00%) 6973.66 ( 6.19%) 7554.82 ( 15.04%) 6471.83 ( -1.45%)
Mean 16 6630.26 ( 0.00%) 7042.52 ( 6.22%) 7331.04 ( 10.57%) 6380.16 ( -3.77%)
Range 1 767.00 ( 0.00%) 194.00 ( 74.71%) 661.00 ( 13.82%) 217.00 ( 71.71%)
Range 2 178.00 ( 0.00%) 185.00 ( -3.93%) 592.00 (-232.58%) 240.00 (-34.83%)
Range 3 175.00 ( 0.00%) 213.00 (-21.71%) 431.00 (-146.29%) 511.00 (-192.00%)
Range 4 806.00 ( 0.00%) 924.00 (-14.64%) 542.00 ( 32.75%) 723.00 ( 10.30%)
Range 5 544.00 ( 0.00%) 438.00 ( 19.49%) 444.00 ( 18.38%) 663.00 (-21.88%)
Range 6 399.00 ( 0.00%) 1111.00 (-178.45%) 528.00 (-32.33%) 1031.00 (-158.40%)
Range 7 629.00 ( 0.00%) 895.00 (-42.29%) 467.00 ( 25.76%) 877.00 (-39.43%)
Range 8 400.00 ( 0.00%) 255.00 ( 36.25%) 435.00 ( -8.75%) 656.00 (-64.00%)
Range 12 233.00 ( 0.00%) 108.00 ( 53.65%) 330.00 (-41.63%) 343.00 (-47.21%)
Range 16 141.00 ( 0.00%) 134.00 ( 4.96%) 496.00 (-251.77%) 291.00 (-106.38%)
Stddev 1 73.94 ( 0.00%) 52.33 ( 29.23%) 177.17 (-139.59%) 37.34 ( 49.51%)
Stddev 2 23.47 ( 0.00%) 42.08 (-79.24%) 88.91 (-278.74%) 38.16 (-62.58%)
Stddev 3 36.48 ( 0.00%) 29.02 ( 20.45%) 101.07 (-177.05%) 134.62 (-269.01%)
Stddev 4 158.37 ( 0.00%) 133.99 ( 15.40%) 130.52 ( 17.59%) 150.61 ( 4.90%)
Stddev 5 116.74 ( 0.00%) 76.76 ( 34.25%) 78.31 ( 32.92%) 116.67 ( 0.06%)
Stddev 6 66.34 ( 0.00%) 273.87 (-312.83%) 87.79 (-32.33%) 235.11 (-254.40%)
Stddev 7 145.62 ( 0.00%) 174.99 (-20.16%) 90.52 ( 37.84%) 156.08 ( -7.18%)
Stddev 8 68.51 ( 0.00%) 47.58 ( 30.54%) 81.11 (-18.39%) 96.00 (-40.13%)
Stddev 12 32.15 ( 0.00%) 20.18 ( 37.22%) 65.74 (-104.50%) 45.00 (-39.99%)
Stddev 16 21.59 ( 0.00%) 20.29 ( 6.01%) 86.42 (-300.25%) 38.20 (-76.93%)
fixsd-v3r3 is all the patches discussed so far applied. Lost at higher
thread counts, won at lower ones. All the results still worse than 3.4.69
To complicate matters further, additional testing indicated that the
tlbflush shift change *may* have made the variation worse. I was preparing
to bisect to search for patches that increased "thread performance spread"
in ebizzy and tested a number of potential bisect points
Tue 17 Dec 11:11:08 GMT 2013 ivy ebizzyrange v3.12 mean-max:36 good
Tue 17 Dec 11:32:28 GMT 2013 ivy ebizzyrange v3.13-rc3 mean-max:80 bad
Tue 17 Dec 12:00:23 GMT 2013 ivy ebizzyrange v3.4 mean-max:0 good
Tue 17 Dec 12:21:58 GMT 2013 ivy ebizzyrange v3.10 mean-max:26 good
Tue 17 Dec 12:42:49 GMT 2013 ivy ebizzyrange v3.11 mean-max:7 good
Tue 17 Dec 13:32:14 GMT 2013 ivy ebizzyrange x86-tlb-range-flush-optimisation-v3r3 mean-max:110 bad
This is part of the log for an automated bisection script. mean-max is
the worst average spread recorded for all threads tested. It's telling
me that the worst thread spread seen by v3.13-rc3 is 80 and the worst
seen by the patch series (tlbflush shift change, fix to sd etc) is 110.
The bisection is doing very few iterations so it could just be co-incidence
but it makes sense. If the kernel is scheduling tasks on CPUs that are not
cache siblings then the cost of remote TLB flushes (range or otherwise)
changes. It's an important enough problem that I feel compelled to
retest with
x86: mm: Clean up inconsistencies when flushing TLB ranges
x86: mm: Account for TLB flushes only when debugging
x86: mm: Eliminate redundant page table walk during TLB range flushing
sched: Assign correct scheduling domain to sd_llc
I'll then re-evalate the tlbflush shift patch based on what falls out of
that test. It may turn out that tlbflush shifts on its own simply cannot
optimise for both the tlbflush microbenchmark and ebizzy as the former
deals with average cost and the latter hits the worst case every time.
At that point it'll be time to look at profiles and see where we are
actually spending time because the possibilities of finding things to fix
through bisection will be exhausted.
> Also it appears the Ebizzy numbers ought to be stable enough now to
> make the range-TLB-flush measurements more precise?
>
Right now, the tlbflush microbenchmark figures look awful on the 8-core
machine when the tlbflush shift patch and the schedule domain fix are
both applied.
--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists