[<prev] [next>] [day] [month] [year] [list]
Message-ID: <0cc7c04b-825d-2c5a-2afe-c52c90554223@intel.com>
Date: Mon, 29 Aug 2022 16:08:10 +0800
From: kernel test robot <yujie.liu@...el.com>
To: Nadav Amit <namit@...are.com>
CC: <lkp@...ts.01.org>, <lkp@...el.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"Peter Zijlstra (Intel)" <peterz@...radead.org>,
Andy Lutomirski <luto@...nel.org>,
<linux-kernel@...r.kernel.org>, <ying.huang@...el.com>,
<feng.tang@...el.com>, <zhengjun.xing@...ux.intel.com>,
<fengwei.yin@...el.com>
Subject: [x86/mm/tlb] aa44284960: will-it-scale.per_thread_ops 12.8%
improvement
Greeting,
FYI, we noticed a 12.8% improvement of will-it-scale.per_thread_ops due to commit:
commit: aa44284960d550eb4d8614afdffebc68a432a9b4 ("x86/mm/tlb: Avoid reading mm_tlb_gen when possible")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: will-it-scale
on test machine: 144 threads 4 sockets Intel(R) Xeon(R) Gold 5318H CPU @ 2.50GHz (Cooper Lake) with 128G memory
with following parameters:
nr_task: 50%
mode: thread
test: tlb_flush3
cpufreq_governor: performance
ucode: 0x7002501
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-11/performance/x86_64-rhel-8.3/thread/50%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp1/tlb_flush3/will-it-scale/0x7002501
commit:
e19d11267f ("x86/mm: Use PAGE_ALIGNED(x) instead of IS_ALIGNED(x, PAGE_SIZE)")
aa44284960 ("x86/mm/tlb: Avoid reading mm_tlb_gen when possible")
e19d11267f0e6c8a aa44284960d550eb4d8614afdff
---------------- ---------------------------
%stddev %change %stddev
\ | \
511972 +12.8% 577452 will-it-scale.72.threads
7110 +12.8% 8019 will-it-scale.per_thread_ops
511972 +12.8% 577452 will-it-scale.workload
29.88 ± 23% +8.2 38.07 mpstat.cpu.all.sys%
0.57 ± 22% +0.2 0.78 ± 5% mpstat.cpu.all.usr%
76693 -0.8% 76064 proc-vmstat.nr_slab_unreclaimable
1.693e+08 +12.2% 1.9e+08 proc-vmstat.pgfault
-967489 +88.9% -1827960 sched_debug.cfs_rq:/.spread0.min
10581 ± 15% -36.2% 6751 ± 9% sched_debug.cpu.clock_task.stddev
5957 ± 17% +16.7% 6952 vmstat.system.cs
5247717 ± 22% +15.1% 6039997 vmstat.system.in
1443105 ± 2% +325.4% 6139401 ±147% turbostat.C1
2.359e+10 +12.1% 2.645e+10 turbostat.IRQ
12.94 ± 15% +10.6% 14.31 turbostat.RAMWatt
9.42 ± 4% -8.5 0.94 ± 36% perf-profile.calltrace.cycles-pp.flush_tlb_func.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
8.85 ± 4% -4.5 4.32 ± 9% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range
8.76 ± 4% -4.5 4.24 ± 9% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond
8.76 ± 4% -4.5 4.26 ± 9% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask
4.63 ± 4% -2.3 2.29 ± 9% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range
4.60 ± 4% -2.3 2.33 ± 10% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
10.06 ± 6% -2.2 7.87 ± 10% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.llist_add_batch
10.08 ± 5% -2.2 7.91 ± 10% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond
10.18 ± 5% -2.1 8.04 ± 9% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask
15.87 ± 7% +3.0 18.91 ± 10% perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu
17.09 ± 7% +3.8 20.93 ± 10% perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range
12.41 ± 4% -10.3 2.14 ± 10% perf-profile.children.cycles-pp.flush_tlb_func
23.01 ± 4% -8.8 14.19 ± 9% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
22.79 ± 4% -8.7 14.08 ± 9% perf-profile.children.cycles-pp.sysvec_call_function
22.56 ± 4% -8.7 13.85 ± 9% perf-profile.children.cycles-pp.__sysvec_call_function
24.02 ± 4% -8.6 15.41 ± 9% perf-profile.children.cycles-pp.asm_sysvec_call_function
1.29 ± 6% -1.0 0.31 ± 8% perf-profile.children.cycles-pp.native_flush_tlb_local
0.82 ± 13% -0.5 0.28 ± 16% perf-profile.children.cycles-pp.sync_mm_rss
0.54 ± 10% -0.1 0.44 ± 10% perf-profile.children.cycles-pp._find_next_bit
0.12 ± 42% -0.1 0.04 ± 88% perf-profile.children.cycles-pp.cpumask_any_but
0.21 ± 7% +0.1 0.29 ± 14% perf-profile.children.cycles-pp.tlb_gather_mmu
33.53 ± 7% +7.1 40.62 ± 10% perf-profile.children.cycles-pp.llist_add_batch
11.12 ± 5% -9.3 1.79 ± 10% perf-profile.self.cycles-pp.flush_tlb_func
1.26 ± 5% -1.0 0.30 ± 9% perf-profile.self.cycles-pp.native_flush_tlb_local
0.87 ± 4% -0.6 0.27 ± 18% perf-profile.self.cycles-pp.flush_tlb_mm_range
0.52 ± 13% -0.3 0.22 ± 16% perf-profile.self.cycles-pp.sync_mm_rss
0.14 ± 10% -0.0 0.10 ± 5% perf-profile.self.cycles-pp.zap_pte_range
0.20 ± 9% +0.1 0.26 ± 9% perf-profile.self.cycles-pp.unmap_page_range
0.24 ± 9% +0.1 0.32 ± 10% perf-profile.self.cycles-pp.down_read
0.13 ± 7% +0.1 0.22 ± 15% perf-profile.self.cycles-pp.tlb_gather_mmu
0.35 ± 10% +0.1 0.45 ± 9% perf-profile.self.cycles-pp.up_read
0.26 ± 9% +0.1 0.37 ± 12% perf-profile.self.cycles-pp.down_read_trylock
0.33 ± 8% +0.1 0.46 ± 12% perf-profile.self.cycles-pp.__handle_mm_fault
22.59 ± 8% +9.0 31.63 ± 10% perf-profile.self.cycles-pp.llist_add_batch
4.618e+09 ± 22% +15.6% 5.338e+09 perf-stat.i.branch-instructions
74833717 ± 15% +18.9% 88940037 perf-stat.i.branch-misses
2.701e+08 ± 23% +21.6% 3.283e+08 perf-stat.i.cache-misses
6.188e+08 ± 19% +20.8% 7.473e+08 perf-stat.i.cache-references
5907 ± 17% +17.0% 6911 perf-stat.i.context-switches
949.85 ± 36% -24.2% 720.07 perf-stat.i.cycles-between-cache-misses
3624697 ± 8% +43.7% 5209027 ± 23% perf-stat.i.dTLB-load-misses
5.649e+09 ± 22% +19.8% 6.77e+09 perf-stat.i.dTLB-loads
1837493 ± 18% +16.8% 2145673 perf-stat.i.dTLB-store-misses
2.675e+09 ± 22% +17.8% 3.151e+09 perf-stat.i.dTLB-stores
83.96 ± 10% +7.4 91.35 perf-stat.i.iTLB-load-miss-rate%
17251959 ± 21% +66.1% 28646920 ± 2% perf-stat.i.iTLB-load-misses
2371482 ± 9% +12.0% 2655934 ± 2% perf-stat.i.iTLB-loads
2.082e+10 ± 22% +17.4% 2.444e+10 perf-stat.i.instructions
1222 ± 4% -29.1% 866.58 ± 2% perf-stat.i.instructions-per-iTLB-miss
95.27 ± 22% +18.2% 112.59 perf-stat.i.metric.M/sec
491703 ± 23% +24.4% 611915 perf-stat.i.minor-faults
1.716e+08 ± 23% +22.8% 2.107e+08 perf-stat.i.node-load-misses
52413373 ± 23% +32.1% 69220689 perf-stat.i.node-store-misses
516138 ± 22% +22.5% 632466 perf-stat.i.node-stores
498600 ± 23% +24.2% 619485 perf-stat.i.page-faults
10.30 -6.2% 9.66 perf-stat.overall.cpi
798.02 ± 2% -9.8% 719.66 perf-stat.overall.cycles-between-cache-misses
87.48 ± 2% +4.0 91.50 perf-stat.overall.iTLB-load-miss-rate%
1203 -29.0% 854.77 ± 2% perf-stat.overall.instructions-per-iTLB-miss
0.10 +6.7% 0.10 perf-stat.overall.ipc
13671646 ± 2% -6.3% 12814760 perf-stat.overall.path-length
4.608e+09 ± 22% +15.5% 5.321e+09 perf-stat.ps.branch-instructions
74674366 ± 15% +18.8% 88686188 perf-stat.ps.branch-misses
2.694e+08 ± 23% +21.4% 3.272e+08 perf-stat.ps.cache-misses
6.173e+08 ± 19% +20.7% 7.448e+08 perf-stat.ps.cache-references
5889 ± 17% +16.9% 6883 perf-stat.ps.context-switches
3615042 ± 8% +43.7% 5193424 ± 23% perf-stat.ps.dTLB-load-misses
5.637e+09 ± 22% +19.7% 6.748e+09 perf-stat.ps.dTLB-loads
1833221 ± 17% +16.7% 2138714 perf-stat.ps.dTLB-store-misses
2.669e+09 ± 22% +17.7% 3.141e+09 perf-stat.ps.dTLB-stores
17219686 ± 21% +65.7% 28530220 ± 2% perf-stat.ps.iTLB-load-misses
2364495 ± 8% +11.9% 2646506 ± 2% perf-stat.ps.iTLB-loads
2.077e+10 ± 22% +17.3% 2.437e+10 perf-stat.ps.instructions
490514 ± 23% +24.3% 609737 perf-stat.ps.minor-faults
1.712e+08 ± 23% +22.6% 2.1e+08 perf-stat.ps.node-load-misses
52293687 ± 23% +31.9% 68974375 perf-stat.ps.node-store-misses
515271 ± 21% +22.4% 630779 perf-stat.ps.node-stores
497546 ± 23% +24.1% 617507 perf-stat.ps.page-faults
6.998e+12 +5.7% 7.4e+12 perf-stat.total.instructions
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://01.org/lkp
View attachment "config-5.18.0-01257-gaa44284960d5" of type "text/plain" (169430 bytes)
View attachment "job-script" of type "text/plain" (8365 bytes)
View attachment "job.yaml" of type "text/plain" (5728 bytes)
View attachment "reproduce" of type "text/plain" (361 bytes)
Powered by blists - more mailing lists