lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20150430062523.GA25995@yliu-dev.sh.intel.com>
Date:	Thu, 30 Apr 2015 14:25:23 +0800
From:	Yuanhan Liu <yuanhan.liu@...ux.intel.com>
To:	NeilBrown <neilb@...e.de>
Cc:	Huang Ying <ying.huang@...el.com>,
	"shli@...nel.org" <shli@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>, LKP ML <lkp@...org>,
	Fengguang Wu <fengguang.wu@...el.com>
Subject: Re: [LKP] [RAID5] 878ee679279: -1.8% vmstat.io.bo, +40.5%
 perf-stat.LLC-load-misses

On Fri, Apr 24, 2015 at 12:15:59PM +1000, NeilBrown wrote:
> On Thu, 23 Apr 2015 14:55:59 +0800 Huang Ying <ying.huang@...el.com> wrote:
> 
> > FYI, we noticed the below changes on
> > 
> > git://neil.brown.name/md for-next
> > commit 878ee6792799e2f88bdcac329845efadb205252f ("RAID5: batch adjacent full stripe write")
> 
> Hi,
>  is there any chance that you could explain what some of this means?
> There is lots of data and some very pretty graphs, but no explanation.

Hi Neil,

(Sorry for late response: Ying is on vacation)

I guess you can simply ignore this report, as I already reported to you
month ago that this patch made fsmark performs better in most cases:

    https://lists.01.org/pipermail/lkp/2015-March/002411.html

> 
> Which numbers are "good", which are "bad"?  Which is "worst".
> What do the graphs really show? and what would we like to see in them?
> 
> I think it is really great that you are doing this testing and reporting the
> results.  It's just so sad that I completely fail to understand them.

Sorry, it's our bad to make them hard to understand as well as
to report a duplicate one(well, the commit hash is different ;).

We might need take some time to make those data understood easier.

	--yliu

> 
> > 
> > 
> > testbox/testcase/testparams: lkp-st02/dd-write/300-5m-11HDD-RAID5-cfq-xfs-1dd
> > 
> > a87d7f782b47e030  878ee6792799e2f88bdcac3298  
> > ----------------  --------------------------  
> >          %stddev     %change         %stddev
> >              \          |                \  
> >      59035 ±  0%     +18.4%      69913 ±  1%  softirqs.SCHED
> >       1330 ± 10%     +17.4%       1561 ±  4%  slabinfo.kmalloc-512.num_objs
> >       1330 ± 10%     +17.4%       1561 ±  4%  slabinfo.kmalloc-512.active_objs
> >     305908 ±  0%      -1.8%     300427 ±  0%  vmstat.io.bo
> >          1 ±  0%    +100.0%          2 ±  0%  vmstat.procs.r
> >       8266 ±  1%     -15.7%       6968 ±  0%  vmstat.system.cs
> >      14819 ±  0%      -2.1%      14503 ±  0%  vmstat.system.in
> >      18.20 ±  6%     +10.2%      20.05 ±  4%  perf-profile.cpu-cycles.raid_run_ops.handle_stripe.handle_active_stripes.raid5d.md_thread
> >       1.94 ±  9%     +90.6%       3.70 ±  9%  perf-profile.cpu-cycles.async_xor.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> >       0.00 ±  0%      +Inf%      25.18 ±  3%  perf-profile.cpu-cycles.handle_active_stripes.isra.45.raid5d.md_thread.kthread.ret_from_fork
> >       0.00 ±  0%      +Inf%      14.14 ±  4%  perf-profile.cpu-cycles.async_copy_data.isra.42.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> >       1.79 ±  7%    +102.9%       3.64 ±  9%  perf-profile.cpu-cycles.xor_blocks.async_xor.raid_run_ops.handle_stripe.handle_active_stripes
> >       3.09 ±  4%     -10.8%       2.76 ±  4%  perf-profile.cpu-cycles.get_active_stripe.make_request.md_make_request.generic_make_request.submit_bio
> >       0.80 ± 14%     +28.1%       1.02 ± 10%  perf-profile.cpu-cycles.mutex_lock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
> >      14.78 ±  6%    -100.0%       0.00 ±  0%  perf-profile.cpu-cycles.async_copy_data.isra.38.raid_run_ops.handle_stripe.handle_active_stripes.raid5d
> >      25.68 ±  4%    -100.0%       0.00 ±  0%  perf-profile.cpu-cycles.handle_active_stripes.isra.41.raid5d.md_thread.kthread.ret_from_fork
> >       1.23 ±  5%    +140.0%       2.96 ±  7%  perf-profile.cpu-cycles.xor_sse_5_pf64.xor_blocks.async_xor.raid_run_ops.handle_stripe
> >       2.62 ±  6%     -95.6%       0.12 ± 33%  perf-profile.cpu-cycles.analyse_stripe.handle_stripe.handle_active_stripes.raid5d.md_thread
> >       0.96 ±  9%     +17.5%       1.12 ±  2%  perf-profile.cpu-cycles.xfs_ilock.xfs_file_buffered_aio_write.xfs_file_write_iter.new_sync_write.vfs_write
> >  1.461e+10 ±  0%      -5.3%  1.384e+10 ±  1%  perf-stat.L1-dcache-load-misses
> >  3.688e+11 ±  0%      -2.7%   3.59e+11 ±  0%  perf-stat.L1-dcache-loads
> >  1.124e+09 ±  0%     -27.7%  8.125e+08 ±  0%  perf-stat.L1-dcache-prefetches
> >  2.767e+10 ±  0%      -1.8%  2.717e+10 ±  0%  perf-stat.L1-dcache-store-misses
> >  2.352e+11 ±  0%      -2.8%  2.287e+11 ±  0%  perf-stat.L1-dcache-stores
> >  6.774e+09 ±  0%      -2.3%   6.62e+09 ±  0%  perf-stat.L1-icache-load-misses
> >  5.571e+08 ±  0%     +40.5%  7.826e+08 ±  1%  perf-stat.LLC-load-misses
> >  6.263e+09 ±  0%     -13.7%  5.407e+09 ±  1%  perf-stat.LLC-loads
> >  1.914e+11 ±  0%      -4.2%  1.833e+11 ±  0%  perf-stat.branch-instructions
> >  1.145e+09 ±  2%      -5.6%  1.081e+09 ±  0%  perf-stat.branch-load-misses
> >  1.911e+11 ±  0%      -4.3%  1.829e+11 ±  0%  perf-stat.branch-loads
> >  1.142e+09 ±  2%      -5.1%  1.083e+09 ±  0%  perf-stat.branch-misses
> >  1.218e+09 ±  0%     +19.8%   1.46e+09 ±  0%  perf-stat.cache-misses
> >  2.118e+10 ±  0%      -5.2%  2.007e+10 ±  0%  perf-stat.cache-references
> >    2510308 ±  1%     -15.7%    2115410 ±  0%  perf-stat.context-switches
> >      39623 ±  0%     +22.1%      48370 ±  1%  perf-stat.cpu-migrations
> >  4.179e+08 ± 40%    +165.7%  1.111e+09 ± 35%  perf-stat.dTLB-load-misses
> >  3.684e+11 ±  0%      -2.5%  3.592e+11 ±  0%  perf-stat.dTLB-loads
> >  1.232e+08 ± 15%     +62.5%  2.002e+08 ± 27%  perf-stat.dTLB-store-misses
> >  2.348e+11 ±  0%      -2.5%  2.288e+11 ±  0%  perf-stat.dTLB-stores
> >    3577297 ±  2%      +8.7%    3888986 ±  1%  perf-stat.iTLB-load-misses
> >  1.035e+12 ±  0%      -3.5%  9.988e+11 ±  0%  perf-stat.iTLB-loads
> >  1.036e+12 ±  0%      -3.7%  9.978e+11 ±  0%  perf-stat.instructions
> >        594 ± 30%    +130.3%       1369 ± 13%  sched_debug.cfs_rq[0]:/.blocked_load_avg
> >         17 ± 10%     -28.2%         12 ± 23%  sched_debug.cfs_rq[0]:/.nr_spread_over
> >        210 ± 21%     +42.1%        298 ± 28%  sched_debug.cfs_rq[0]:/.tg_runnable_contrib
> >       9676 ± 21%     +42.1%      13754 ± 28%  sched_debug.cfs_rq[0]:/.avg->runnable_avg_sum
> >        772 ± 25%    +116.5%       1672 ±  9%  sched_debug.cfs_rq[0]:/.tg_load_contrib
> >       8402 ±  9%     +83.3%      15405 ± 11%  sched_debug.cfs_rq[0]:/.tg_load_avg
> >       8356 ±  9%     +82.8%      15272 ± 11%  sched_debug.cfs_rq[1]:/.tg_load_avg
> >        968 ± 25%    +100.8%       1943 ± 14%  sched_debug.cfs_rq[1]:/.blocked_load_avg
> >      16242 ±  9%     -22.2%      12643 ± 14%  sched_debug.cfs_rq[1]:/.avg->runnable_avg_sum
> >        353 ±  9%     -22.1%        275 ± 14%  sched_debug.cfs_rq[1]:/.tg_runnable_contrib
> >       1183 ± 23%     +77.7%       2102 ± 12%  sched_debug.cfs_rq[1]:/.tg_load_contrib
> >        181 ±  8%     -31.4%        124 ± 26%  sched_debug.cfs_rq[2]:/.tg_runnable_contrib
> >       8364 ±  8%     -31.3%       5745 ± 26%  sched_debug.cfs_rq[2]:/.avg->runnable_avg_sum
> >       8297 ±  9%     +81.7%      15079 ± 12%  sched_debug.cfs_rq[2]:/.tg_load_avg
> >      30439 ± 13%     -45.2%      16681 ± 26%  sched_debug.cfs_rq[2]:/.exec_clock
> >      39735 ± 14%     -48.3%      20545 ± 29%  sched_debug.cfs_rq[2]:/.min_vruntime
> >       8231 ± 10%     +82.2%      15000 ± 12%  sched_debug.cfs_rq[3]:/.tg_load_avg
> >       1210 ± 14%    +110.3%       2546 ± 30%  sched_debug.cfs_rq[4]:/.tg_load_contrib
> >       8188 ± 10%     +82.8%      14964 ± 12%  sched_debug.cfs_rq[4]:/.tg_load_avg
> >       8132 ± 10%     +83.1%      14890 ± 12%  sched_debug.cfs_rq[5]:/.tg_load_avg
> >        749 ± 29%    +205.9%       2292 ± 34%  sched_debug.cfs_rq[5]:/.blocked_load_avg
> >        963 ± 30%    +169.9%       2599 ± 33%  sched_debug.cfs_rq[5]:/.tg_load_contrib
> >      37791 ± 32%     -38.6%      23209 ± 13%  sched_debug.cfs_rq[6]:/.min_vruntime
> >        693 ± 25%    +132.2%       1609 ± 29%  sched_debug.cfs_rq[6]:/.blocked_load_avg
> >      10838 ± 13%     -39.2%       6587 ± 13%  sched_debug.cfs_rq[6]:/.avg->runnable_avg_sum
> >      29329 ± 27%     -33.2%      19577 ± 10%  sched_debug.cfs_rq[6]:/.exec_clock
> >        235 ± 14%     -39.7%        142 ± 14%  sched_debug.cfs_rq[6]:/.tg_runnable_contrib
> >       8085 ± 10%     +83.6%      14848 ± 12%  sched_debug.cfs_rq[6]:/.tg_load_avg
> >        839 ± 25%    +128.5%       1917 ± 18%  sched_debug.cfs_rq[6]:/.tg_load_contrib
> >       8051 ± 10%     +83.6%      14779 ± 12%  sched_debug.cfs_rq[7]:/.tg_load_avg
> >        156 ± 34%     +97.9%        309 ± 19%  sched_debug.cpu#0.cpu_load[4]
> >        160 ± 25%     +64.0%        263 ± 16%  sched_debug.cpu#0.cpu_load[2]
> >        156 ± 32%     +83.7%        286 ± 17%  sched_debug.cpu#0.cpu_load[3]
> >        164 ± 20%     -35.1%        106 ± 31%  sched_debug.cpu#2.cpu_load[0]
> >        249 ± 15%     +80.2%        449 ± 10%  sched_debug.cpu#4.cpu_load[3]
> >        231 ± 11%    +101.2%        466 ± 13%  sched_debug.cpu#4.cpu_load[2]
> >        217 ± 14%    +189.9%        630 ± 38%  sched_debug.cpu#4.cpu_load[0]
> >      71951 ±  5%     +21.6%      87526 ±  7%  sched_debug.cpu#4.nr_load_updates
> >        214 ±  8%    +146.1%        527 ± 27%  sched_debug.cpu#4.cpu_load[1]
> >        256 ± 17%     +75.7%        449 ± 13%  sched_debug.cpu#4.cpu_load[4]
> >        209 ± 23%     +98.3%        416 ± 48%  sched_debug.cpu#5.cpu_load[2]
> >      68024 ±  2%     +18.8%      80825 ±  1%  sched_debug.cpu#5.nr_load_updates
> >        217 ± 26%     +74.9%        380 ± 45%  sched_debug.cpu#5.cpu_load[3]
> >        852 ± 21%     -38.3%        526 ± 22%  sched_debug.cpu#6.curr->pid
> > 
> > lkp-st02: Core2
> > Memory: 8G
> > 
> > 
> > 
> > 
> >                                 perf-stat.cache-misses
> > 
> >   1.6e+09 O+-----O--O---O--O---O--------------------------------------------+
> >           |                       O   O  O   O  O   O  O   O  O   O         |
> >   1.4e+09 ++                                                                |
> >   1.2e+09 *+.*...*      *..*      *      *...*..*...*..*...*..*...*..*...*..*
> >           |      :      :  :      :      :                                  |
> >     1e+09 ++      :    :    :    : :    :                                   |
> >           |       :    :    :    : :    :                                   |
> >     8e+08 ++      :    :    :    : :    :                                   |
> >           |       :   :      :   :  :   :                                   |
> >     6e+08 ++       :  :      :  :   :  :                                    |
> >     4e+08 ++       : :        : :    : :                                    |
> >           |        : :        : :    : :                                    |
> >     2e+08 ++       : :        : :    : :                                    |
> >           |         :          :      :                                     |
> >         0 ++-O------*----------*------*-------------------------------------+
> > 
> > 
> >                             perf-stat.L1-dcache-prefetches
> > 
> >   1.2e+09 ++----------------------------------------------------------------+
> >           *..*...*      *..*      *        ..*..  ..*..*...*..*...*..*...*..*
> >     1e+09 ++     :      :  :      :      *.     *.                          |
> >           |      :     :    :     ::     :                                  |
> >           |       :    :    :    : :     :                        O         |
> >     8e+08 O+     O: O  :O  O:  O :O:  O :O   O  O   O  O   O  O             |
> >           |       :   :      :   :  :   :                                   |
> >     6e+08 ++      :   :      :   :  :   :                                   |
> >           |        :  :      :  :   :   :                                   |
> >     4e+08 ++       :  :      :  :   :  :                                    |
> >           |        : :        : :    : :                                    |
> >           |        : :        : :    : :                                    |
> >     2e+08 ++        ::        ::     : :                                    |
> >           |         :          :      :                                     |
> >         0 ++-O------*----------*------*-------------------------------------+
> > 
> > 
> >                               perf-stat.LLC-load-misses
> > 
> >   1e+09 ++------------------------------------------------------------------+
> >   9e+08 O+     O   O  O   O  O                                              |
> >         |                        O   O  O   O                               |
> >   8e+08 ++                                     O   O   O  O   O  O          |
> >   7e+08 ++                                                                  |
> >         |                                                                   |
> >   6e+08 *+..*..*      *...*      *      *...*..*...*...*..*...*..*...*..*...*
> >   5e+08 ++      :     :   :      ::     :                                   |
> >   4e+08 ++      :    :     :    : :    :                                    |
> >         |        :   :     :    :  :   :                                    |
> >   3e+08 ++       :   :      :  :   :   :                                    |
> >   2e+08 ++        : :       :  :    : :                                     |
> >         |         : :       : :     : :                                     |
> >   1e+08 ++         :         ::      :                                      |
> >       0 ++--O------*---------*-------*--------------------------------------+
> > 
> > 
> >                               perf-stat.context-switches
> > 
> >     3e+06 ++----------------------------------------------------------------+
> >           |                              *...*..*...                        |
> >   2.5e+06 *+.*...*      *..*      *      :          *..*...  .*...*..*...  .*
> >           |      :      :  :      :      :                 *.            *. |
> >           O      O: O  :O  O:  O  ::    :       O   O  O   O  O   O         |
> >     2e+06 ++      :    :    :    :O:  O :O   O                              |
> >           |       :    :    :    : :    :                                   |
> >   1.5e+06 ++      :   :      :   :  :   :                                   |
> >           |        :  :      :   :  :  :                                    |
> >     1e+06 ++       :  :      :  :   :  :                                    |
> >           |        : :        : :    : :                                    |
> >           |        : :        : :    : :                                    |
> >    500000 ++        ::        : :    ::                                     |
> >           |         :          :      :                                     |
> >         0 ++-O------*----------*------*-------------------------------------+
> > 
> > 
> >                                   vmstat.system.cs
> > 
> >   10000 ++------------------------------------------------------------------+
> >    9000 ++                              *...*..                             |
> >         *...*..*      *...*      *      :      *...*...*..  ..*..*...*..  ..*
> >    8000 ++     :      :   :      :      :                 *.            *.  |
> >    7000 O+     O:  O  O   O: O  : :    :       O   O   O  O   O  O          |
> >         |       :    :     :    :O:  O :O   O                               |
> >    6000 ++      :    :     :    : :    :                                    |
> >    5000 ++       :   :     :   :   :   :                                    |
> >    4000 ++       :   :      :  :   :  :                                     |
> >         |        :  :       :  :   :  :                                     |
> >    3000 ++        : :       : :     : :                                     |
> >    2000 ++        : :       : :     : :                                     |
> >         |         : :        ::     ::                                      |
> >    1000 ++         :         :       :                                      |
> >       0 ++--O------*---------*-------*--------------------------------------+
> > 
> > 
> > 	[*] bisect-good sample
> > 	[O] bisect-bad  sample
> > 
> > To reproduce:
> > 
> > 	apt-get install ruby
> > 	git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
> > 	cd lkp-tests
> > 	bin/setup-local job.yaml # the job file attached in this email
> > 	bin/run-local   job.yaml
> > 
> > 
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and are provided
> > for informational purposes only. Any difference in system hardware or software
> > design or configuration may affect actual performance.
> > 
> > 
> > Thanks,
> > Ying Huang
> > 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ