[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200114085637.GA29297@shao2-debian>
Date: Tue, 14 Jan 2020 16:56:37 +0800
From: kernel test robot <rong.a.chen@...el.com>
To: Waiman Long <longman@...hat.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Mike Kravetz <mike.kravetz@...cle.com>,
Davidlohr Bueso <dbueso@...e.de>,
Michal Hocko <mhocko@...e.com>,
Kirill Tkhai <ktkhai@...tuozzo.com>,
"Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>,
Matthew Wilcox <willy@...radead.org>,
Andi Kleen <ak@...ux.intel.com>,
Andrew Morton <akpm@...ux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org
Subject: [mm/hugetlb] c77c0a8ac4: will-it-scale.per_process_ops 15.9%
improvement
Greeting,
FYI, we noticed a 15.9% improvement of will-it-scale.per_process_ops due to commit:
commit: c77c0a8ac4c522638a8242fcb9de9496e3cdbb2d ("mm/hugetlb: defer freeing of huge pages if in non-task context")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: will-it-scale
on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
with following parameters:
nr_task: 50%
mode: process
test: page_fault3
cpufreq_governor: performance
ucode: 0x500002c
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-7/performance/x86_64-rhel-7.6/process/50%/debian-x86_64-2019-11-14.cgz/lkp-csl-2ap3/page_fault3/will-it-scale/0x500002c
commit:
a7c46c0c0e ("mm/gup: fix memory leak in __gup_benchmark_ioctl")
c77c0a8ac4 ("mm/hugetlb: defer freeing of huge pages if in non-task context")
a7c46c0c0e3d62f2 c77c0a8ac4c522638a8242fcb9d
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:4 25% 1:4 dmesg.WARNING:at_ip___perf_sw_event/0x
21:4 82% 25:4 perf-profile.calltrace.cycles-pp.sync_regs.error_entry
24:4 92% 28:4 perf-profile.calltrace.cycles-pp.error_entry
0:4 1% 0:4 perf-profile.children.cycles-pp.error_exit
25:4 97% 29:4 perf-profile.children.cycles-pp.error_entry
0:4 1% 0:4 perf-profile.self.cycles-pp.error_exit
2:4 11% 3:4 perf-profile.self.cycles-pp.error_entry
%stddev %change %stddev
\ | \
5.86 ± 12% -2.9 2.97 ± 10% perf-profile.calltrace.cycles-pp.__count_memcg_events.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
2.00 ± 12% -1.1 0.95 ± 12% perf-profile.calltrace.cycles-pp.lock_page_memcg.page_add_file_rmap.alloc_set_pte.finish_fault.handle_pte_fault
0.83 ± 12% +0.2 1.01 ± 9% perf-profile.calltrace.cycles-pp.file_update_time.fault_dirty_shared_page.handle_pte_fault.__handle_mm_fault.handle_mm_fault
0.40 ± 57% +0.2 0.62 ± 10% perf-profile.calltrace.cycles-pp.current_time.file_update_time.fault_dirty_shared_page.handle_pte_fault.__handle_mm_fault
14.61 ± 11% -4.1 10.54 ± 10% perf-profile.children.cycles-pp.native_irq_return_iret
5.86 ± 12% -2.9 2.97 ± 10% perf-profile.children.cycles-pp.__count_memcg_events
2.54 ± 12% -1.2 1.29 ± 12% perf-profile.children.cycles-pp.lock_page_memcg
0.50 ± 12% +0.1 0.62 ± 10% perf-profile.children.cycles-pp.current_time
0.83 ± 12% +0.2 1.01 ± 9% perf-profile.children.cycles-pp.file_update_time
14.60 ± 11% -4.1 10.54 ± 10% perf-profile.self.cycles-pp.native_irq_return_iret
5.83 ± 12% -2.9 2.95 ± 10% perf-profile.self.cycles-pp.__count_memcg_events
2.50 ± 12% -1.2 1.25 ± 12% perf-profile.self.cycles-pp.lock_page_memcg
0.23 ± 13% -0.1 0.16 ± 9% perf-profile.self.cycles-pp.__unlock_page_memcg
824554 +15.9% 955434 will-it-scale.per_process_ops
79157305 +15.9% 91721706 will-it-scale.workload
41420 ± 95% -80.4% 8122 ± 58% numa-meminfo.node3.AnonHugePages
308.61 +4.0% 321.03 turbostat.PkgWatt
51328483 +14.5% 58776435 proc-vmstat.numa_hit
51272762 +14.5% 58720733 proc-vmstat.numa_local
51446016 +14.5% 58921941 proc-vmstat.pgalloc_normal
2.381e+10 +15.8% 2.758e+10 proc-vmstat.pgfault
50812338 ± 2% +13.0% 57413676 ± 2% proc-vmstat.pgfree
7179986 +15.6% 8300189 numa-vmstat.node0.numa_hit
7170547 +15.7% 8295341 numa-vmstat.node0.numa_local
7267631 +10.9% 8059572 numa-vmstat.node1.numa_hit
7166107 +11.1% 7958194 numa-vmstat.node1.numa_local
7161204 +11.8% 8007798 numa-vmstat.node2.numa_hit
7056803 +12.0% 7901667 numa-vmstat.node2.numa_local
12704221 +17.8% 14964315 numa-numastat.node0.local_node
12713482 +17.7% 14968938 numa-numastat.node0.numa_hit
12946189 +13.5% 14695756 numa-numastat.node1.local_node
12960130 +13.5% 14709699 numa-numastat.node1.numa_hit
12816622 +13.3% 14527218 numa-numastat.node2.local_node
12833628 +13.3% 14545778 numa-numastat.node2.numa_hit
12814554 +13.7% 14572722 numa-numastat.node3.local_node
12830035 +13.7% 14591272 numa-numastat.node3.numa_hit
9311 ± 88% -62.1% 3529 ± 12% softirqs.CPU116.SCHED
20911 ± 80% -83.1% 3531 ± 14% softirqs.CPU117.SCHED
9130 ± 96% -61.6% 3503 ± 14% softirqs.CPU118.SCHED
21250 ± 79% -82.4% 3729 ± 7% softirqs.CPU131.SCHED
119649 ± 24% -24.0% 90953 softirqs.CPU131.TIMER
12060 ±113% -69.7% 3651 ± 14% softirqs.CPU153.SCHED
12095 ±112% -70.6% 3552 ± 14% softirqs.CPU159.SCHED
20918 ± 79% -83.1% 3532 ± 13% softirqs.CPU169.SCHED
21337 ± 81% -83.0% 3634 ± 15% softirqs.CPU180.SCHED
12102 ±113% -70.4% 3577 ± 13% softirqs.CPU185.SCHED
41.50 ±121% -95.8% 1.75 ± 74% interrupts.CPU115.RES:Rescheduling_interrupts
5306 ± 41% -39.9% 3191 ± 47% interrupts.CPU12.NMI:Non-maskable_interrupts
5306 ± 41% -39.9% 3191 ± 47% interrupts.CPU12.PMI:Performance_monitoring_interrupts
7979 ± 15% -29.6% 5614 ± 34% interrupts.CPU126.NMI:Non-maskable_interrupts
7979 ± 15% -29.6% 5614 ± 34% interrupts.CPU126.PMI:Performance_monitoring_interrupts
5197 ± 39% +68.2% 8741 interrupts.CPU138.NMI:Non-maskable_interrupts
5197 ± 39% +68.2% 8741 interrupts.CPU138.PMI:Performance_monitoring_interrupts
4289 ± 5% -19.2% 3466 ± 14% interrupts.CPU144.CAL:Function_call_interrupts
5154 ± 40% +54.6% 7969 ± 16% interrupts.CPU150.NMI:Non-maskable_interrupts
5154 ± 40% +54.6% 7969 ± 16% interrupts.CPU150.PMI:Performance_monitoring_interrupts
4269 ± 5% -14.8% 3635 ± 9% interrupts.CPU156.CAL:Function_call_interrupts
4478 ± 14% -17.9% 3677 ± 7% interrupts.CPU49.CAL:Function_call_interrupts
3413 ± 6% +23.8% 4225 ± 4% interrupts.CPU5.CAL:Function_call_interrupts
4195 ± 5% -13.4% 3632 ± 4% interrupts.CPU50.CAL:Function_call_interrupts
2955 ± 48% +50.8% 4458 ± 16% interrupts.CPU75.CAL:Function_call_interrupts
50672 ±152% -92.8% 3670 ± 30% interrupts.RES:Rescheduling_interrupts
2.25 ± 3% -16.4% 1.88 perf-stat.i.MPKI
4.231e+10 +16.7% 4.937e+10 perf-stat.i.branch-instructions
1.021e+08 +16.7% 1.191e+08 perf-stat.i.branch-misses
41.81 +0.9 42.67 perf-stat.i.cache-miss-rate%
1.859e+08 +3.1% 1.918e+08 perf-stat.i.cache-misses
4.423e+08 +1.4% 4.485e+08 perf-stat.i.cache-references
1.45 -14.9% 1.23 perf-stat.i.cpi
1587 -2.2% 1553 perf-stat.i.cycles-between-cache-misses
0.00 ± 6% -0.0 0.00 ± 7% perf-stat.i.dTLB-load-miss-rate%
5.921e+10 ± 2% +13.4% 6.713e+10 ± 3% perf-stat.i.dTLB-loads
2.33e+09 ± 2% +18.6% 2.763e+09 perf-stat.i.dTLB-store-misses
3.018e+10 ± 2% +18.2% 3.568e+10 perf-stat.i.dTLB-stores
2.075e+11 +16.7% 2.421e+11 perf-stat.i.instructions
0.70 +16.3% 0.81 perf-stat.i.ipc
78135721 +16.8% 91283662 perf-stat.i.minor-faults
33.70 ± 4% -16.9 16.76 ± 3% perf-stat.i.node-load-miss-rate%
2303552 ± 2% -52.6% 1090928 ± 3% perf-stat.i.node-load-misses
4735768 ± 4% +17.7% 5575862 ± 3% perf-stat.i.node-loads
7.74 ± 7% -0.7 7.08 perf-stat.i.node-store-miss-rate%
5836219 ± 3% +24.5% 7264951 ± 3% perf-stat.i.node-store-misses
77683656 ± 3% +25.5% 97502265 ± 4% perf-stat.i.node-stores
78135605 +16.8% 91283300 perf-stat.i.page-faults
2.13 -13.1% 1.85 perf-stat.overall.MPKI
42.04 +0.7 42.75 perf-stat.overall.cache-miss-rate%
1.42 -13.6% 1.23 perf-stat.overall.cpi
1588 -2.3% 1552 perf-stat.overall.cycles-between-cache-misses
0.70 +15.7% 0.81 perf-stat.overall.ipc
32.75 ± 2% -16.4 16.37 ± 3% perf-stat.overall.node-load-miss-rate%
4.216e+10 +16.7% 4.919e+10 perf-stat.ps.branch-instructions
1.017e+08 +16.7% 1.187e+08 perf-stat.ps.branch-misses
1.853e+08 +3.1% 1.911e+08 perf-stat.ps.cache-misses
4.407e+08 +1.4% 4.469e+08 perf-stat.ps.cache-references
5.9e+10 ± 2% +13.4% 6.69e+10 ± 3% perf-stat.ps.dTLB-loads
2.321e+09 ± 2% +18.6% 2.753e+09 perf-stat.ps.dTLB-store-misses
3.008e+10 ± 2% +18.2% 3.556e+10 perf-stat.ps.dTLB-stores
2.068e+11 +16.7% 2.412e+11 perf-stat.ps.instructions
77856103 +16.8% 90958076 perf-stat.ps.minor-faults
2295616 ± 2% -52.6% 1087383 ± 3% perf-stat.ps.node-load-misses
4718877 ± 4% +17.7% 5556031 ± 3% perf-stat.ps.node-loads
5815493 ± 3% +24.5% 7239177 ± 3% perf-stat.ps.node-store-misses
77405910 ± 3% +25.5% 97154591 ± 4% perf-stat.ps.node-stores
77855962 +16.8% 90957705 perf-stat.ps.page-faults
6.318e+13 +15.6% 7.304e+13 perf-stat.total.instructions
will-it-scale.per_process_ops
960000 O-O-O--O-O---O--O-O-O-O--O-O---O-----------------------------------+
| O O |
940000 +-+ |
920000 +-+ |
| |
900000 +-+ |
| |
880000 +-+ |
| |
860000 +-+ |
840000 +-+.+.. .+.. .+. .+. .+. .+. |
| + +.+.+ +.+.+.+. + +..+ + +..+.+.+. .+. |
820000 +-+ +..+.+ +..+.+.|
| |
800000 +-+----------------------------------------------------------------+
will-it-scale.workload
9.4e+07 +-+---------------------------------------------------------------+
| |
9.2e+07 O-O O O O O O O O O O O O O O |
9e+07 +-+ |
| |
8.8e+07 +-+ |
| |
8.6e+07 +-+ |
| |
8.4e+07 +-+ |
8.2e+07 +-+ |
| |
8e+07 +-+.+..+.+.+.+.+..+.+.+.+.+..+.+.+.+.+..+.+.+.+.+.. |
| +.+.+.+.+..+.+.|
7.8e+07 +-+---------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Rong Chen
View attachment "config-5.5.0-rc4-00150-gc77c0a8ac4c52" of type "text/plain" (202282 bytes)
View attachment "job-script" of type "text/plain" (7666 bytes)
View attachment "job.yaml" of type "text/plain" (5181 bytes)
View attachment "reproduce" of type "text/plain" (315 bytes)
Powered by blists - more mailing lists