[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181228142608.GA17624@shao2-debian>
Date: Fri, 28 Dec 2018 22:26:08 +0800
From: kernel test robot <rong.a.chen@...el.com>
To: Mike Kravetz <mike.kravetz@...cle.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Michal Hocko <mhocko@...nel.org>,
Hugh Dickins <hughd@...gle.com>,
Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
"Aneesh Kumar K . V" <aneesh.kumar@...ux.vnet.ibm.com>,
Andrea Arcangeli <aarcange@...hat.com>,
"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
Davidlohr Bueso <dave@...olabs.net>,
Prakash Sangappa <prakash.sangappa@...cle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Mike Kravetz <mike.kravetz@...cle.com>, stable@...r.kernel.org,
lkp@...org
Subject: [LKP] [hugetlbfs] 9c83282117: vm-scalability.throughput -4.3%
regression
Greeting,
FYI, we noticed a -4.3% regression of vm-scalability.throughput due to commit:
commit: 9c83282117778856d647ffc461c4aede2abb6742 ("[PATCH v3 1/2] hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
url: https://github.com/0day-ci/linux/commits/Mike-Kravetz/hugetlbfs-use-i_mmap_rwsem-for-better-synchronization/20181223-095226
in testcase: vm-scalability
on test machine: 104 threads Intel(R) Xeon(R) Platinum 8170 CPU @ 2.10GHz with 64G memory
with following parameters:
runtime: 300s
size: 8T
test: anon-cow-seq-hugetlb
cpufreq_governor: performance
ucode: 0x200004d
test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase/ucode:
gcc-7/performance/x86_64-rhel-7.2/debian-x86_64-2018-04-03.cgz/300s/8T/lkp-skl-2sp4/anon-cow-seq-hugetlb/vm-scalability/0x200004d
commit:
0cd60eb1a7 ("dma-mapping: fix flags in dma_alloc_wc")
9c83282117 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
0cd60eb1a7b5421e 9c83282117778856d647ffc461
---------------- --------------------------
%stddev %change %stddev
\ | \
184494 -10.7% 164684 vm-scalability.median
20393229 -4.3% 19523319 vm-scalability.throughput
37986 ± 2% -4.3% 36341 ± 2% vm-scalability.time.involuntary_context_switches
3670375 -1.0% 3635385 vm-scalability.time.minor_page_faults
5808 -9.9% 5236 vm-scalability.time.percent_of_cpu_this_job_got
10665 -6.4% 9980 vm-scalability.time.system_time
6873 -15.2% 5829 vm-scalability.time.user_time
1561119 +42.4% 2222959 vm-scalability.time.voluntary_context_switches
304034 ± 10% -15.5% 256985 ± 7% meminfo.DirectMap4k
2455420 +17.5% 2884045 softirqs.SCHED
15179 ± 57% -77.2% 3468 ±167% numa-numastat.node0.other_node
5069 ±171% +231.5% 16803 ± 34% numa-numastat.node1.other_node
58.25 -14.6% 49.75 vmstat.procs.r
13194 +33.3% 17592 vmstat.system.cs
30.81 +4.7 35.50 mpstat.cpu.idle%
0.00 ± 39% +0.0 0.00 ± 19% mpstat.cpu.soft%
22.13 -3.4 18.73 mpstat.cpu.usr%
1608 -9.5% 1454 turbostat.Avg_MHz
57.68 -5.5 52.16 turbostat.Busy%
42.17 +12.7% 47.54 turbostat.CPU%c1
1896 ± 10% -13.5% 1639 ± 12% slabinfo.UNIX.active_objs
1896 ± 10% -13.5% 1639 ± 12% slabinfo.UNIX.num_objs
512.00 ± 8% +18.8% 608.00 ± 5% slabinfo.ebitmap_node.active_objs
512.00 ± 8% +18.8% 608.00 ± 5% slabinfo.ebitmap_node.num_objs
832.00 ± 13% +23.1% 1024 ± 10% slabinfo.scsi_sense_cache.active_objs
832.00 ± 13% +23.1% 1024 ± 10% slabinfo.scsi_sense_cache.num_objs
1309088 -1.8% 1285325 proc-vmstat.nr_dirty_background_threshold
2621507 -1.8% 2573971 proc-vmstat.nr_dirty_threshold
13199577 -1.8% 12961837 proc-vmstat.nr_free_pages
1742 +1.8% 1774 proc-vmstat.nr_page_table_pages
22375 -2.8% 21752 proc-vmstat.nr_shmem
1259 ± 37% +61.5% 2033 ± 19% proc-vmstat.numa_huge_pte_updates
681268 ± 35% +59.1% 1084220 ± 19% proc-vmstat.numa_pte_updates
13983 -8.3% 12823 ± 4% proc-vmstat.pgactivate
0.05 +0.0 0.05 perf-stat.branch-miss-rate%
2.109e+09 +4.3% 2.2e+09 perf-stat.branch-misses
78.76 -1.9 76.88 perf-stat.cache-miss-rate%
1.113e+11 -2.9% 1.081e+11 perf-stat.cache-misses
3996996 +33.6% 5341757 perf-stat.context-switches
3.37 -9.0% 3.07 perf-stat.cpi
4.944e+13 -9.6% 4.471e+13 perf-stat.cpu-cycles
211278 +5.0% 221866 perf-stat.cpu-migrations
0.00 ± 7% +0.0 0.00 ± 5% perf-stat.dTLB-load-miss-rate%
49679544 ± 7% +17.5% 58377845 ± 4% perf-stat.dTLB-load-misses
0.00 ± 4% +0.0 0.00 ± 2% perf-stat.dTLB-store-miss-rate%
15180335 ± 4% +14.0% 17307062 ± 2% perf-stat.dTLB-store-misses
10.83 ± 3% -1.8 9.08 ± 3% perf-stat.iTLB-load-miss-rate%
44270724 ± 3% -8.4% 40569884 ± 2% perf-stat.iTLB-load-misses
3.644e+08 +11.5% 4.065e+08 perf-stat.iTLB-loads
331624 ± 3% +8.4% 359414 ± 2% perf-stat.instructions-per-iTLB-miss
0.30 +9.9% 0.33 perf-stat.ipc
51.92 +1.8 53.74 perf-stat.node-load-miss-rate%
1.48e+10 -6.0% 1.391e+10 perf-stat.node-loads
1.497e+10 -6.9% 1.394e+10 perf-stat.node-stores
10272 ± 14% -19.0% 8323 ± 13% sched_debug.cfs_rq:/.load.avg
7232660 ± 9% -20.1% 5782120 ± 10% sched_debug.cfs_rq:/.min_vruntime.max
0.52 ± 5% -18.9% 0.43 ± 5% sched_debug.cfs_rq:/.nr_running.avg
1.67 ± 10% -33.1% 1.12 ± 15% sched_debug.cfs_rq:/.nr_spread_over.avg
7.52 ± 10% -29.6% 5.29 ± 2% sched_debug.cfs_rq:/.runnable_load_avg.avg
10163 ± 13% -18.7% 8262 ± 13% sched_debug.cfs_rq:/.runnable_weight.avg
2147344 ± 11% -29.4% 1515179 ± 10% sched_debug.cfs_rq:/.spread0.avg
3673348 ± 11% -22.3% 2854166 ± 5% sched_debug.cfs_rq:/.spread0.max
396.82 ± 13% -26.6% 291.11 ± 4% sched_debug.cfs_rq:/.util_est_enqueued.avg
6.81 ± 4% -25.8% 5.05 sched_debug.cpu.cpu_load[0].avg
6.96 ± 6% -25.3% 5.20 ± 2% sched_debug.cpu.cpu_load[1].avg
7.01 ± 4% -23.0% 5.40 ± 2% sched_debug.cpu.cpu_load[2].avg
7.09 ± 3% -19.2% 5.73 ± 2% sched_debug.cpu.cpu_load[3].avg
54.42 ± 33% -55.2% 24.39 ± 9% sched_debug.cpu.cpu_load[3].max
8.94 ± 21% -33.4% 5.96 ± 5% sched_debug.cpu.cpu_load[3].stddev
7.34 ± 3% -15.0% 6.24 ± 2% sched_debug.cpu.cpu_load[4].avg
72.43 ± 16% -29.4% 51.15 ± 18% sched_debug.cpu.cpu_load[4].max
10.51 ± 8% -20.8% 8.32 ± 7% sched_debug.cpu.cpu_load[4].stddev
18364 ± 10% +26.5% 23240 ± 11% sched_debug.cpu.nr_switches.avg
12769 ± 11% +43.0% 18261 ± 13% sched_debug.cpu.nr_switches.min
17580 ± 10% +28.1% 22513 ± 11% sched_debug.cpu.sched_count.avg
12302 ± 10% +41.6% 17424 ± 11% sched_debug.cpu.sched_count.min
8539 ± 10% +29.3% 11037 ± 11% sched_debug.cpu.sched_goidle.avg
5806 ± 11% +43.1% 8309 ± 11% sched_debug.cpu.sched_goidle.min
8747 ± 10% +28.1% 11205 ± 11% sched_debug.cpu.ttwu_count.avg
17367 ± 11% +29.1% 22427 ± 6% sched_debug.cpu.ttwu_count.max
1788 ± 11% +90.2% 3402 ± 12% sched_debug.cpu.ttwu_count.stddev
0.77 ± 3% +0.2 0.95 ± 5% perf-profile.calltrace.cycles-pp.alloc_huge_page.hugetlb_cow.hugetlb_fault.handle_mm_fault.__do_page_fault
0.66 ± 4% +0.2 0.88 ± 5% perf-profile.calltrace.cycles-pp.alloc_surplus_huge_page.alloc_huge_page.hugetlb_cow.hugetlb_fault.handle_mm_fault
0.56 ± 6% +0.3 0.83 ± 5% perf-profile.calltrace.cycles-pp.alloc_fresh_huge_page.alloc_surplus_huge_page.alloc_huge_page.hugetlb_cow.hugetlb_fault
0.27 ±100% +0.5 0.73 ± 4% perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages_nodemask.alloc_fresh_huge_page.alloc_surplus_huge_page.alloc_huge_page
0.27 ±100% +0.5 0.74 ± 4% perf-profile.calltrace.cycles-pp.__alloc_pages_nodemask.alloc_fresh_huge_page.alloc_surplus_huge_page.alloc_huge_page.hugetlb_cow
0.56 ± 4% -0.2 0.32 ± 3% perf-profile.children.cycles-pp._raw_spin_lock
0.42 ± 4% -0.2 0.22 perf-profile.children.cycles-pp.release_pages
0.41 ± 3% -0.2 0.21 ± 2% perf-profile.children.cycles-pp.free_huge_page
0.42 ± 4% -0.2 0.23 ± 2% perf-profile.children.cycles-pp.arch_tlb_finish_mmu
0.42 ± 4% -0.2 0.23 ± 2% perf-profile.children.cycles-pp.tlb_flush_mmu_free
0.42 ± 4% -0.2 0.23 perf-profile.children.cycles-pp.tlb_finish_mmu
0.46 ± 4% -0.2 0.28 ± 2% perf-profile.children.cycles-pp.mmput
0.46 ± 4% -0.2 0.28 perf-profile.children.cycles-pp.__x64_sys_exit_group
0.46 ± 4% -0.2 0.28 perf-profile.children.cycles-pp.do_group_exit
0.46 ± 4% -0.2 0.28 perf-profile.children.cycles-pp.do_exit
0.45 ± 3% -0.2 0.28 ± 2% perf-profile.children.cycles-pp.exit_mmap
0.94 ± 3% -0.1 0.85 ± 4% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
0.94 ± 3% -0.1 0.85 ± 4% perf-profile.children.cycles-pp.do_syscall_64
0.17 ± 4% -0.0 0.14 ± 3% perf-profile.children.cycles-pp.update_and_free_page
0.12 ± 5% +0.0 0.14 ± 5% perf-profile.children.cycles-pp.__account_scheduler_latency
0.08 ± 8% +0.0 0.10 ± 8% perf-profile.children.cycles-pp.sched_ttwu_pending
0.17 ± 6% +0.0 0.20 ± 2% perf-profile.children.cycles-pp.enqueue_entity
0.18 ± 6% +0.0 0.21 ± 3% perf-profile.children.cycles-pp.enqueue_task_fair
0.17 ± 4% +0.0 0.20 ± 8% perf-profile.children.cycles-pp.schedule
0.18 ± 6% +0.0 0.21 ± 2% perf-profile.children.cycles-pp.ttwu_do_activate
0.05 ± 9% +0.0 0.09 perf-profile.children.cycles-pp.prep_new_huge_page
0.16 ± 5% +0.0 0.20 ± 4% perf-profile.children.cycles-pp.io_serial_in
0.24 ± 5% +0.0 0.28 ± 6% perf-profile.children.cycles-pp.__schedule
0.03 ±100% +0.0 0.07 ± 10% perf-profile.children.cycles-pp.delay_tsc
0.18 ± 4% +0.1 0.24 ± 2% perf-profile.children.cycles-pp.serial8250_console_putchar
0.19 ± 6% +0.1 0.26 ± 3% perf-profile.children.cycles-pp.wait_for_xmitr
0.18 ± 5% +0.1 0.25 ± 2% perf-profile.children.cycles-pp.uart_console_write
0.20 ± 6% +0.1 0.27 ± 2% perf-profile.children.cycles-pp.serial8250_console_write
0.20 ± 18% +0.1 0.28 ± 5% perf-profile.children.cycles-pp._fini
0.20 ± 16% +0.1 0.28 ± 5% perf-profile.children.cycles-pp.devkmsg_write
0.20 ± 16% +0.1 0.28 ± 5% perf-profile.children.cycles-pp.printk_emit
0.26 ± 8% +0.1 0.34 ± 5% perf-profile.children.cycles-pp.__vfs_write
0.23 ± 12% +0.1 0.31 ± 5% perf-profile.children.cycles-pp.vprintk_emit
1.65 ± 4% +0.1 1.73 perf-profile.children.cycles-pp.__mutex_lock
0.22 ± 9% +0.1 0.30 ± 3% perf-profile.children.cycles-pp.console_unlock
0.22 ± 13% +0.1 0.30 ± 5% perf-profile.children.cycles-pp.write
0.26 ± 8% +0.1 0.35 ± 4% perf-profile.children.cycles-pp.ksys_write
0.26 ± 8% +0.1 0.35 ± 4% perf-profile.children.cycles-pp.vfs_write
0.59 ± 4% +0.1 0.68 ± 3% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
0.93 ± 3% +0.2 1.12 ± 4% perf-profile.children.cycles-pp.alloc_huge_page
0.79 ± 2% +0.2 1.03 ± 4% perf-profile.children.cycles-pp.alloc_surplus_huge_page
0.60 ± 2% +0.3 0.88 ± 5% perf-profile.children.cycles-pp.__alloc_pages_nodemask
0.59 ± 2% +0.3 0.87 ± 5% perf-profile.children.cycles-pp.get_page_from_freelist
0.66 ± 2% +0.3 0.97 ± 4% perf-profile.children.cycles-pp.alloc_fresh_huge_page
0.15 ± 4% +0.3 0.48 ± 6% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
25.44 ± 6% -2.5 22.95 ± 10% perf-profile.self.cycles-pp.do_rw_once
0.46 ± 2% -0.0 0.41 ± 2% perf-profile.self.cycles-pp.get_page_from_freelist
0.17 ± 2% -0.0 0.14 ± 5% perf-profile.self.cycles-pp.update_and_free_page
0.15 ± 7% +0.0 0.20 ± 4% perf-profile.self.cycles-pp.io_serial_in
0.01 ±173% +0.1 0.06 ± 6% perf-profile.self.cycles-pp.delay_tsc
1.59 ± 3% +0.1 1.67 perf-profile.self.cycles-pp.mutex_spin_on_owner
0.58 ± 3% +0.1 0.68 ± 4% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
vm-scalability.time.user_time
7000 +-+------------------------------------------------------------------+
| : |
6000 O-+ O :O O O O O O O O O O |
|: : |
5000 +-+ : |
|: : |
4000 +-+ : |
| : : |
3000 +-+ : |
| : : |
2000 +-+: : |
| : : |
1000 +-+: : |
| : |
0 +-+------------------------------------------------------------------+
vm-scalability.time.system_time
12000 +-+-----------------------------------------------------------------+
| ..+...+... |
10000 O-+ O O...O...O...O...O...O...O. O O O...+...+...+...+...+...|
| : |
|: : |
8000 +-+ : |
|: : |
6000 +-+ : |
| : : |
4000 +-+ : |
| : : |
| : : |
2000 +-+: : |
| : |
0 +-+-----------------------------------------------------------------+
vm-scalability.time.percent_of_cpu_this_job_got
6000 +-+------------------------------------------------------------------+
| : +. +.. +...+...+...+...+...+. |
5000 O-+ O :O O O O O O O O O O |
|: : |
|: : |
4000 +-+ : |
| : : |
3000 +-+ : |
| : : |
2000 +-+: : |
| : : |
| : : |
1000 +-+: : |
| : |
0 +-+------------------------------------------------------------------+
vm-scalability.time.voluntary_context_switches
2.5e+06 +-+---------------------------------------------------------------+
| O O |
O O O O O O O O O O |
2e+06 +-+ |
| |
| +...+...+..+...+...+...+...+...+...+...+..+...+...+...+...|
1.5e+06 +-+ : |
|: : |
1e+06 +-+ : |
| : : |
| : : |
500000 +-+: : |
| : : |
| : |
0 +-+---------------------------------------------------------------+
vm-scalability.throughput
2.5e+07 +-+---------------------------------------------------------------+
| |
| ..+...+... ..+... |
2e+07 O-+ O O...O...O..O. O O O...O...O...O...+..+...+. +...|
| : |
|: : |
1.5e+07 +-+ : |
|: : |
1e+07 +-+ : |
| : : |
| : : |
5e+06 +-+: : |
| : : |
| : |
0 +-+---------------------------------------------------------------+
vm-scalability.median
200000 +-+----------------------------------------------------------------+
180000 +-+ +...+...+...+...+...+...+..+...+...+...+...+...+...+...+...|
O O O O O O O |
160000 +-+ O : O O O O |
140000 +-+ : |
|: : |
120000 +-+ : |
100000 +-+ : |
80000 +-+ : |
| : : |
60000 +-+: : |
40000 +-+: : |
| : : |
20000 +-+ : |
0 +-+----------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Rong Chen
View attachment "config-4.20.0-rc7-00276-g9c83282" of type "text/plain" (168460 bytes)
View attachment "job-script" of type "text/plain" (7488 bytes)
View attachment "job.yaml" of type "text/plain" (5041 bytes)
View attachment "reproduce" of type "text/plain" (12271 bytes)
Powered by blists - more mailing lists