[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20210701020831.GA21279@xsang-OptiPlex-9020>
Date: Thu, 1 Jul 2021 10:08:31 +0800
From: kernel test robot <oliver.sang@...el.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>
Cc: 0day robot <lkp@...el.com>, LKML <linux-kernel@...r.kernel.org>,
lkp@...ts.01.org, ying.huang@...el.com, feng.tang@...el.com,
zhengjun.xing@...ux.intel.com, linux-mm@...ck.org,
akpm@...ux-foundation.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
"Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>
Subject: [mm] c0af5203b9: will-it-scale.per_process_ops 7.4% improvement
Greeting,
FYI, we noticed a 7.4% improvement of will-it-scale.per_process_ops due to commit:
commit: c0af5203b9dbf4cd8b424298b1cb809f4535802a ("[PATCH 2/2] mm: Change p4d_page_vaddr to return pud_t *")
url: https://github.com/0day-ci/linux/commits/Aneesh-Kumar-K-V/mm-Change-pud_page_vaddr-to-return-pmd_t/20210617-045835
base: https://git.kernel.org/cgit/linux/kernel/git/powerpc/linux.git next
in testcase: will-it-scale
on test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory
with following parameters:
nr_task: 16
mode: process
test: mmap1
cpufreq_governor: performance
ucode: 0x5003006
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
bin/lkp run generated-yaml-file
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/process/16/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp9/mmap1/will-it-scale/0x5003006
commit:
edb4f1ceb1 ("mm: Change pud_page_vaddr to return pmd_t *")
c0af5203b9 ("mm: Change p4d_page_vaddr to return pud_t *")
edb4f1ceb161c083 c0af5203b9dbf4cd8b424298b1c
---------------- ---------------------------
%stddev %change %stddev
\ | \
10634491 +7.4% 11422635 will-it-scale.16.processes
664655 +7.4% 713914 will-it-scale.per_process_ops
10634491 +7.4% 11422635 will-it-scale.workload
6236 ± 11% +34.6% 8391 ± 21% softirqs.CPU59.RCU
2.473e+10 +7.4% 2.656e+10 perf-stat.i.branch-instructions
0.46 ± 3% -6.9% 0.43 perf-stat.i.cpi
2.535e+10 +7.6% 2.727e+10 perf-stat.i.dTLB-loads
1.149e+10 +7.3% 1.232e+10 perf-stat.i.dTLB-stores
1.027e+11 +7.4% 1.103e+11 perf-stat.i.instructions
2.21 +6.7% 2.36 perf-stat.i.ipc
699.56 +7.5% 751.76 perf-stat.i.metric.M/sec
0.45 -6.3% 0.42 perf-stat.overall.cpi
2.22 +6.7% 2.37 perf-stat.overall.ipc
2.464e+10 +7.4% 2.647e+10 perf-stat.ps.branch-instructions
2.526e+10 +7.6% 2.718e+10 perf-stat.ps.dTLB-loads
1.145e+10 +7.3% 1.228e+10 perf-stat.ps.dTLB-stores
1.024e+11 +7.4% 1.099e+11 perf-stat.ps.instructions
3.095e+13 +7.3% 3.321e+13 perf-stat.total.instructions
47.73 -4.1 43.67 ± 9% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
47.39 -4.1 43.34 ± 9% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
46.85 -4.1 42.79 ± 9% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
44.68 -4.0 40.64 ± 9% perf-profile.calltrace.cycles-pp.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
46.45 -4.0 42.45 ± 9% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
37.90 -4.0 33.91 ± 9% perf-profile.calltrace.cycles-pp.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
10.29 -2.7 7.55 ± 9% perf-profile.calltrace.cycles-pp.___might_sleep.unmap_page_range.unmap_vmas.unmap_region.__do_munmap
27.98 -2.5 25.48 ± 9% perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
27.18 -2.5 24.70 ± 9% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.__do_munmap.__vm_munmap
6.98 -1.6 5.40 ± 9% perf-profile.calltrace.cycles-pp.free_pgd_range.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
46.86 -4.1 42.80 ± 9% perf-profile.children.cycles-pp.__x64_sys_munmap
44.74 -4.0 40.70 ± 9% perf-profile.children.cycles-pp.__do_munmap
46.47 -4.0 42.46 ± 9% perf-profile.children.cycles-pp.__vm_munmap
37.98 -4.0 33.98 ± 9% perf-profile.children.cycles-pp.unmap_region
11.13 -2.7 8.46 ± 9% perf-profile.children.cycles-pp.___might_sleep
27.26 -2.5 24.75 ± 9% perf-profile.children.cycles-pp.unmap_page_range
28.02 -2.5 25.52 ± 9% perf-profile.children.cycles-pp.unmap_vmas
7.02 -1.6 5.43 ± 9% perf-profile.children.cycles-pp.free_pgd_range
0.71 ± 3% -0.3 0.42 ± 18% perf-profile.children.cycles-pp.security_vm_enough_memory_mm
0.61 ± 3% -0.3 0.32 ± 22% perf-profile.children.cycles-pp.cap_vm_enough_memory
0.24 ± 6% -0.1 0.15 ± 17% perf-profile.children.cycles-pp.cap_capable
0.26 ± 5% -0.1 0.20 ± 11% perf-profile.children.cycles-pp.refill_obj_stock
0.05 ± 44% +0.0 0.08 ± 14% perf-profile.children.cycles-pp.tlb_table_flush
0.17 ± 4% +0.1 0.29 ± 8% perf-profile.children.cycles-pp.tlb_flush_mmu
10.98 -2.6 8.35 ± 9% perf-profile.self.cycles-pp.___might_sleep
6.98 -1.6 5.40 ± 9% perf-profile.self.cycles-pp.free_pgd_range
0.36 ± 5% -0.2 0.17 ± 27% perf-profile.self.cycles-pp.cap_vm_enough_memory
0.22 ± 7% -0.1 0.13 ± 19% perf-profile.self.cycles-pp.cap_capable
0.50 ± 5% -0.1 0.41 ± 10% perf-profile.self.cycles-pp.security_mmap_file
0.24 ± 5% -0.1 0.18 ± 11% perf-profile.self.cycles-pp.refill_obj_stock
0.10 ± 6% +0.1 0.23 ± 8% perf-profile.self.cycles-pp.tlb_flush_mmu
will-it-scale.per_process_ops
720000 +------------------------------------------------------------------+
| OO O O OO O OO O O OO |
710000 |-+ |
700000 |-+ |
| |
690000 |-+ |
| |
680000 |-OO O O O OO O O OO O O O O O |
| O O O O O O O O O |
670000 |.+ .+. .++.+.++.+ .++.+.+.++.+. +.+.+.++. |
660000 |-++ + : +.+ + +.++.+.+.+ |
| : + |
650000 |-+ : +. + |
| +.+ +.+ |
640000 +------------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/lkp@lists.01.org Intel Corporation
Thanks,
Oliver Sang
View attachment "config-5.13.0-rc2-00045-gc0af5203b9db" of type "text/plain" (174118 bytes)
View attachment "job-script" of type "text/plain" (8006 bytes)
View attachment "job.yaml" of type "text/plain" (5453 bytes)
View attachment "reproduce" of type "text/plain" (337 bytes)
Powered by blists - more mailing lists