[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150424121559.321677ce@notabene.brown>
Date: Fri, 24 Apr 2015 12:15:59 +1000
From: NeilBrown <neilb@...e.de>
To: Huang Ying <ying.huang@...el.com>
Cc: "shli@...nel.org" <shli@...nel.org>,
LKML <linux-kernel@...r.kernel.org>, LKP ML <lkp@...org>
Subject: Re: [LKP] [RAID5] 878ee679279: -1.8% vmstat.io.bo, +40.5%
perf-stat.LLC-load-misses
On Thu, 23 Apr 2015 14:55:59 +0800 Huang Ying <ying.huang@...el.com> wrote:
> FYI, we noticed the below changes on
>
> git://neil.brown.name/md for-next
> commit 878ee6792799e2f88bdcac329845efadb205252f ("RAID5: batch adjacent full stripe write")
Hi,
is there any chance that you could explain what some of this means?
There is lots of data and some very pretty graphs, but no explanation.
Which numbers are "good", which are "bad"? Which is "worst".
What do the graphs really show? and what would we like to see in them?
I think it is really great that you are doing this testing and reporting the
results. It's just so sad that I completely fail to understand them.
Thanks,
NeilBrown
>
>
> testbox/testcase/testparams: lkp-st02/dd-write/300-5m-11HDD-RAID5-cfq-xfs-1dd
>
> a87d7f782b47e030 878ee6792799e2f88bdcac3298
> ---------------- --------------------------
> %stddev %change %stddev
> \ | \
> 59035 ± 0% +18.4% 69913 ± 1% softirqs.SCHED
> 1330 ± 10% +17.4% 1561 ± 4% slabinfo.kmalloc-512.num_objs
> 1330 ± 10% +17.4% 1561 ± 4% slabinfo.kmalloc-512.active_objs
> 305908 ± 0% -1.8% 300427 ± 0% vmstat.io.bo
> 1 ± 0% +100.0% 2 ± 0% vmstat.procs.r
> 8266 ± 1% -15.7% 6968 ± 0% vmstat.system.cs
> 14819 ± 0% -2.1% 14503 ± 0% vmstat.system.in
> 18.20 ± 6% +10.2% 20.05 ± 4% perf-profile.cpu-cycles.raid_run_ops.handle_stripe.handle_active_stripes.raid5d.md_thread
> 1.94 ± 9% +90.6% 3.70 ± 9% perf-profile.cpu-cycles.async_xor.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> 0.00 ± 0% +Inf% 25.18 ± 3% perf-profile.cpu-cycles.handle_active_stripes.isra.45.raid5d.md_thread.kthread.ret_from_fork
> 0.00 ± 0% +Inf% 14.14 ± 4% perf-profile.cpu-cycles.async_copy_data.isra.42.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> 1.79 ± 7% +102.9% 3.64 ± 9% perf-profile.cpu-cycles.xor_blocks.async_xor.raid_run_ops.handle_stripe.handle_active_stripes
> 3.09 ± 4% -10.8% 2.76 ± 4% perf-profile.cpu-cycles.get_active_stripe.make_request.md_make_request.generic_make_request.submit_bio
> 0.80 ± 14% +28.1% 1.02 ± 10% perf-profile.cpu-cycles.mutex_lock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
> 14.78 ± 6% -100.0% 0.00 ± 0% perf-profile.cpu-cycles.async_copy_data.isra.38.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> 25.68 ± 4% -100.0% 0.00 ± 0% perf-profile.cpu-cycles.handle_active_stripes.isra.41.raid5d.md_thread.kthread.ret_from_fork
> 1.23 ± 5% +140.0% 2.96 ± 7% perf-profile.cpu-cycles.xor_sse_5_pf64.xor_blocks.async_xor.raid_run_ops.handle_stripe
> 2.62 ± 6% -95.6% 0.12 ± 33% perf-profile.cpu-cycles.analyse_stripe.handle_stripe.handle_active_stripes.raid5d.md_thread
> 0.96 ± 9% +17.5% 1.12 ± 2% perf-profile.cpu-cycles.xfs_ilock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
> 1.461e+10 ± 0% -5.3% 1.384e+10 ± 1% perf-stat.L1-dcache-load-misses
> 3.688e+11 ± 0% -2.7% 3.59e+11 ± 0% perf-stat.L1-dcache-loads
> 1.124e+09 ± 0% -27.7% 8.125e+08 ± 0% perf-stat.L1-dcache-prefetches
> 2.767e+10 ± 0% -1.8% 2.717e+10 ± 0% perf-stat.L1-dcache-store-misses
> 2.352e+11 ± 0% -2.8% 2.287e+11 ± 0% perf-stat.L1-dcache-stores
> 6.774e+09 ± 0% -2.3% 6.62e+09 ± 0% perf-stat.L1-icache-load-misses
> 5.571e+08 ± 0% +40.5% 7.826e+08 ± 1% perf-stat.LLC-load-misses
> 6.263e+09 ± 0% -13.7% 5.407e+09 ± 1% perf-stat.LLC-loads
> 1.914e+11 ± 0% -4.2% 1.833e+11 ± 0% perf-stat.branch-instructions
> 1.145e+09 ± 2% -5.6% 1.081e+09 ± 0% perf-stat.branch-load-misses
> 1.911e+11 ± 0% -4.3% 1.829e+11 ± 0% perf-stat.branch-loads
> 1.142e+09 ± 2% -5.1% 1.083e+09 ± 0% perf-stat.branch-misses
> 1.218e+09 ± 0% +19.8% 1.46e+09 ± 0% perf-stat.cache-misses
> 2.118e+10 ± 0% -5.2% 2.007e+10 ± 0% perf-stat.cache-references
> 2510308 ± 1% -15.7% 2115410 ± 0% perf-stat.context-switches
> 39623 ± 0% +22.1% 48370 ± 1% perf-stat.cpu-migrations
> 4.179e+08 ± 40% +165.7% 1.111e+09 ± 35% perf-stat.dTLB-load-misses
> 3.684e+11 ± 0% -2.5% 3.592e+11 ± 0% perf-stat.dTLB-loads
> 1.232e+08 ± 15% +62.5% 2.002e+08 ± 27% perf-stat.dTLB-store-misses
> 2.348e+11 ± 0% -2.5% 2.288e+11 ± 0% perf-stat.dTLB-stores
> 3577297 ± 2% +8.7% 3888986 ± 1% perf-stat.iTLB-load-misses
> 1.035e+12 ± 0% -3.5% 9.988e+11 ± 0% perf-stat.iTLB-loads
> 1.036e+12 ± 0% -3.7% 9.978e+11 ± 0% perf-stat.instructions
> 594 ± 30% +130.3% 1369 ± 13% sched_debug.cfs_rq[0]:/.blocked_load_avg
> 17 ± 10% -28.2% 12 ± 23% sched_debug.cfs_rq[0]:/.nr_spread_over
> 210 ± 21% +42.1% 298 ± 28% sched_debug.cfs_rq[0]:/.tg_runnable_contrib
> 9676 ± 21% +42.1% 13754 ± 28% sched_debug.cfs_rq[0]:/.avg->runnable_avg_sum
> 772 ± 25% +116.5% 1672 ± 9% sched_debug.cfs_rq[0]:/.tg_load_contrib
> 8402 ± 9% +83.3% 15405 ± 11% sched_debug.cfs_rq[0]:/.tg_load_avg
> 8356 ± 9% +82.8% 15272 ± 11% sched_debug.cfs_rq[1]:/.tg_load_avg
> 968 ± 25% +100.8% 1943 ± 14% sched_debug.cfs_rq[1]:/.blocked_load_avg
> 16242 ± 9% -22.2% 12643 ± 14% sched_debug.cfs_rq[1]:/.avg->runnable_avg_sum
> 353 ± 9% -22.1% 275 ± 14% sched_debug.cfs_rq[1]:/.tg_runnable_contrib
> 1183 ± 23% +77.7% 2102 ± 12% sched_debug.cfs_rq[1]:/.tg_load_contrib
> 181 ± 8% -31.4% 124 ± 26% sched_debug.cfs_rq[2]:/.tg_runnable_contrib
> 8364 ± 8% -31.3% 5745 ± 26% sched_debug.cfs_rq[2]:/.avg->runnable_avg_sum
> 8297 ± 9% +81.7% 15079 ± 12% sched_debug.cfs_rq[2]:/.tg_load_avg
> 30439 ± 13% -45.2% 16681 ± 26% sched_debug.cfs_rq[2]:/.exec_clock
> 39735 ± 14% -48.3% 20545 ± 29% sched_debug.cfs_rq[2]:/.min_vruntime
> 8231 ± 10% +82.2% 15000 ± 12% sched_debug.cfs_rq[3]:/.tg_load_avg
> 1210 ± 14% +110.3% 2546 ± 30% sched_debug.cfs_rq[4]:/.tg_load_contrib
> 8188 ± 10% +82.8% 14964 ± 12% sched_debug.cfs_rq[4]:/.tg_load_avg
> 8132 ± 10% +83.1% 14890 ± 12% sched_debug.cfs_rq[5]:/.tg_load_avg
> 749 ± 29% +205.9% 2292 ± 34% sched_debug.cfs_rq[5]:/.blocked_load_avg
> 963 ± 30% +169.9% 2599 ± 33% sched_debug.cfs_rq[5]:/.tg_load_contrib
> 37791 ± 32% -38.6% 23209 ± 13% sched_debug.cfs_rq[6]:/.min_vruntime
> 693 ± 25% +132.2% 1609 ± 29% sched_debug.cfs_rq[6]:/.blocked_load_avg
> 10838 ± 13% -39.2% 6587 ± 13% sched_debug.cfs_rq[6]:/.avg->runnable_avg_sum
> 29329 ± 27% -33.2% 19577 ± 10% sched_debug.cfs_rq[6]:/.exec_clock
> 235 ± 14% -39.7% 142 ± 14% sched_debug.cfs_rq[6]:/.tg_runnable_contrib
> 8085 ± 10% +83.6% 14848 ± 12% sched_debug.cfs_rq[6]:/.tg_load_avg
> 839 ± 25% +128.5% 1917 ± 18% sched_debug.cfs_rq[6]:/.tg_load_contrib
> 8051 ± 10% +83.6% 14779 ± 12% sched_debug.cfs_rq[7]:/.tg_load_avg
> 156 ± 34% +97.9% 309 ± 19% sched_debug.cpu#0.cpu_load[4]
> 160 ± 25% +64.0% 263 ± 16% sched_debug.cpu#0.cpu_load[2]
> 156 ± 32% +83.7% 286 ± 17% sched_debug.cpu#0.cpu_load[3]
> 164 ± 20% -35.1% 106 ± 31% sched_debug.cpu#2.cpu_load[0]
> 249 ± 15% +80.2% 449 ± 10% sched_debug.cpu#4.cpu_load[3]
> 231 ± 11% +101.2% 466 ± 13% sched_debug.cpu#4.cpu_load[2]
> 217 ± 14% +189.9% 630 ± 38% sched_debug.cpu#4.cpu_load[0]
> 71951 ± 5% +21.6% 87526 ± 7% sched_debug.cpu#4.nr_load_updates
> 214 ± 8% +146.1% 527 ± 27% sched_debug.cpu#4.cpu_load[1]
> 256 ± 17% +75.7% 449 ± 13% sched_debug.cpu#4.cpu_load[4]
> 209 ± 23% +98.3% 416 ± 48% sched_debug.cpu#5.cpu_load[2]
> 68024 ± 2% +18.8% 80825 ± 1% sched_debug.cpu#5.nr_load_updates
> 217 ± 26% +74.9% 380 ± 45% sched_debug.cpu#5.cpu_load[3]
> 852 ± 21% -38.3% 526 ± 22% sched_debug.cpu#6.curr->pid
>
> lkp-st02: Core2
> Memory: 8G
>
>
>
>
> perf-stat.cache-misses
>
> 1.6e+09 O+-----O--O---O--O---O--------------------------------------------+
> | O O O O O O O O O O |
> 1.4e+09 ++ |
> 1.2e+09 *+.*...* *..* * *...*..*...*..*...*..*...*..*...*..*
> | : : : : : |
> 1e+09 ++ : : : : : : |
> | : : : : : : |
> 8e+08 ++ : : : : : : |
> | : : : : : : |
> 6e+08 ++ : : : : : : |
> 4e+08 ++ : : : : : : |
> | : : : : : : |
> 2e+08 ++ : : : : : : |
> | : : : |
> 0 ++-O------*----------*------*-------------------------------------+
>
>
> perf-stat.L1-dcache-prefetches
>
> 1.2e+09 ++----------------------------------------------------------------+
> *..*...* *..* * ..*.. ..*..*...*..*...*..*...*..*
> 1e+09 ++ : : : : *. *. |
> | : : : :: : |
> | : : : : : : O |
> 8e+08 O+ O: O :O O: O :O: O :O O O O O O O |
> | : : : : : : |
> 6e+08 ++ : : : : : : |
> | : : : : : : |
> 4e+08 ++ : : : : : : |
> | : : : : : : |
> | : : : : : : |
> 2e+08 ++ :: :: : : |
> | : : : |
> 0 ++-O------*----------*------*-------------------------------------+
>
>
> perf-stat.LLC-load-misses
>
> 1e+09 ++------------------------------------------------------------------+
> 9e+08 O+ O O O O O |
> | O O O O |
> 8e+08 ++ O O O O O O |
> 7e+08 ++ |
> | |
> 6e+08 *+..*..* *...* * *...*..*...*...*..*...*..*...*..*...*
> 5e+08 ++ : : : :: : |
> 4e+08 ++ : : : : : : |
> | : : : : : : |
> 3e+08 ++ : : : : : : |
> 2e+08 ++ : : : : : : |
> | : : : : : : |
> 1e+08 ++ : :: : |
> 0 ++--O------*---------*-------*--------------------------------------+
>
>
> perf-stat.context-switches
>
> 3e+06 ++----------------------------------------------------------------+
> | *...*..*... |
> 2.5e+06 *+.*...* *..* * : *..*... .*...*..*... .*
> | : : : : : *. *. |
> O O: O :O O: O :: : O O O O O O |
> 2e+06 ++ : : : :O: O :O O |
> | : : : : : : |
> 1.5e+06 ++ : : : : : : |
> | : : : : : : |
> 1e+06 ++ : : : : : : |
> | : : : : : : |
> | : : : : : : |
> 500000 ++ :: : : :: |
> | : : : |
> 0 ++-O------*----------*------*-------------------------------------+
>
>
> vmstat.system.cs
>
> 10000 ++------------------------------------------------------------------+
> 9000 ++ *...*.. |
> *...*..* *...* * : *...*...*.. ..*..*...*.. ..*
> 8000 ++ : : : : : *. *. |
> 7000 O+ O: O O O: O : : : O O O O O O |
> | : : : :O: O :O O |
> 6000 ++ : : : : : : |
> 5000 ++ : : : : : : |
> 4000 ++ : : : : : : |
> | : : : : : : |
> 3000 ++ : : : : : : |
> 2000 ++ : : : : : : |
> | : : :: :: |
> 1000 ++ : : : |
> 0 ++--O------*---------*-------*--------------------------------------+
>
>
> [*] bisect-good sample
> [O] bisect-bad sample
>
> To reproduce:
>
> apt-get install ruby
> git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
> cd lkp-tests
> bin/setup-local job.yaml # the job file attached in this email
> bin/run-local job.yaml
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> Thanks,
> Ying Huang
>
Content of type "application/pgp-signature" skipped
Powered by blists - more mailing lists