[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20181102011322.GF24195@shao2-debian>
Date: Fri, 2 Nov 2018 09:13:22 +0800
From: kernel test robot <rong.a.chen@...el.com>
To: Huang Ying <ying.huang@...el.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Mike Kravetz <mike.kravetz@...cle.com>,
Andi Kleen <andi.kleen@...el.com>, Jan Kara <jack@...e.cz>,
Michal Hocko <mhocko@...e.com>,
Andrea Arcangeli <aarcange@...hat.com>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Matthew Wilcox <willy@...radead.org>,
Hugh Dickins <hughd@...gle.com>,
Minchan Kim <minchan@...nel.org>, Shaohua Li <shli@...com>,
Christopher Lameter <cl@...ux.com>,
Andrew Morton <akpm@...ux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>, lkp@...org
Subject: [LKP] [mm, huge page] c9f4cd7138: vm-scalability.median 6.1%
improvement
Greeting,
FYI, we noticed a 6.1% improvement of vm-scalability.median due to commit:
commit: c9f4cd71383576a916e7fca99c490fc92a289f5a ("mm, huge page: copy target sub-page last when copy huge page")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: vm-scalability
on test machine: 80 threads Skylake with 64G memory
with following parameters:
runtime: 300s
size: 8T
test: anon-cow-seq
cpufreq_governor: performance
test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
gcc-7/performance/x86_64-rhel-7.2/debian-x86_64-2018-04-03.cgz/300s/8T/lkp-skl-2sp2/anon-cow-seq/vm-scalability
commit:
c6ddfb6c58 ("mm, clear_huge_page: move order algorithm into a separate function")
c9f4cd7138 ("mm, huge page: copy target sub-page last when copy huge page")
c6ddfb6c58903262 c9f4cd71383576a916e7fca99c
---------------- --------------------------
fail:runs %reproduction fail:runs
| | |
:4 25% 1:4 kmsg.usb#-#:can't_read_configurations,error
:4 25% 1:4 kmsg.usb#-#:unable_to_read_config_index#descriptor/all
%stddev %change %stddev
\ | \
259561 +6.1% 275418 vm-scalability.median
20307212 +6.7% 21677136 vm-scalability.throughput
6804 -3.3% 6582 vm-scalability.time.percent_of_cpu_this_job_got
9928 +10.5% 10967 vm-scalability.time.system_time
10633 -16.2% 8905 vm-scalability.time.user_time
142485 ± 2% +15.4% 164368 ± 3% vm-scalability.time.voluntary_context_switches
5.225e+09 +2.8% 5.371e+09 vm-scalability.workload
2658 -3.0% 2578 turbostat.Avg_MHz
14.49 ± 4% +2.6 17.05 ± 2% mpstat.cpu.idle%
0.01 ± 64% +0.0 0.02 ± 56% mpstat.cpu.iowait%
41.37 +4.4 45.79 mpstat.cpu.sys%
44.13 -7.0 37.13 mpstat.cpu.usr%
1235 +10.3% 1362 ± 3% slabinfo.Acpi-State.active_objs
1235 +10.3% 1362 ± 3% slabinfo.Acpi-State.num_objs
732.75 ± 6% +15.5% 846.25 ± 4% slabinfo.dmaengine-unmap-16.active_objs
732.75 ± 6% +15.5% 846.25 ± 4% slabinfo.dmaengine-unmap-16.num_objs
1109 ± 28% +33.9% 1485 ± 7% slabinfo.mnt_cache.active_objs
1109 ± 28% +33.9% 1485 ± 7% slabinfo.mnt_cache.num_objs
2102895 ± 11% -26.3% 1550652 ± 13% numa-vmstat.node0.nr_active_anon
2085004 ± 12% -26.3% 1536368 ± 13% numa-vmstat.node0.nr_anon_pages
2569 ± 9% -24.4% 1944 ± 15% numa-vmstat.node0.nr_anon_transparent_hugepages
5839784 ± 4% +9.6% 6400386 ± 3% numa-vmstat.node0.nr_free_pages
2102683 ± 11% -26.3% 1550242 ± 13% numa-vmstat.node0.nr_zone_active_anon
2217075 ± 6% -19.7% 1780618 ± 11% numa-vmstat.node1.nr_active_anon
2197340 ± 6% -19.6% 1766852 ± 11% numa-vmstat.node1.nr_anon_pages
2217051 ± 6% -19.7% 1780056 ± 11% numa-vmstat.node1.nr_zone_active_anon
5.45e+12 +2.9% 5.607e+12 perf-stat.branch-instructions
66.64 -2.6 64.02 perf-stat.cache-miss-rate%
1.13e+11 -2.5% 1.102e+11 perf-stat.cache-misses
3.65 -6.2% 3.42 perf-stat.cpi
6.271e+13 -3.5% 6.053e+13 perf-stat.cpu-cycles
3.721e+12 +2.8% 3.826e+12 perf-stat.dTLB-loads
1.274e+12 +2.9% 1.311e+12 perf-stat.dTLB-stores
41.59 ± 14% -7.3 34.34 ± 13% perf-stat.iTLB-load-miss-rate%
1.067e+08 ± 11% -13.5% 92279346 ± 12% perf-stat.iTLB-load-misses
1.719e+13 +2.9% 1.77e+13 perf-stat.instructions
0.27 +6.7% 0.29 perf-stat.ipc
13845164 ± 9% +14.9% 15911418 ± 5% perf-stat.node-store-misses
7012 ± 29% +50.2% 10533 ± 6% sched_debug.cfs_rq:/.exec_clock.stddev
292662 ± 30% -61.0% 114038 ± 79% sched_debug.cfs_rq:/.load.max
37786 ± 26% -59.5% 15322 ± 75% sched_debug.cfs_rq:/.load.stddev
322667 ± 29% +39.3% 449382 ± 6% sched_debug.cfs_rq:/.min_vruntime.stddev
133.45 ± 65% -72.5% 36.66 ± 32% sched_debug.cfs_rq:/.runnable_load_avg.max
19.87 ± 47% -65.2% 6.92 ± 45% sched_debug.cfs_rq:/.runnable_load_avg.stddev
288543 ± 29% -61.2% 111944 ± 80% sched_debug.cfs_rq:/.runnable_weight.max
37233 ± 26% -59.4% 15127 ± 76% sched_debug.cfs_rq:/.runnable_weight.stddev
322554 ± 29% +39.1% 448625 ± 6% sched_debug.cfs_rq:/.spread0.stddev
91.32 ± 16% -60.1% 36.41 ± 32% sched_debug.cpu.cpu_load[0].max
14.81 ± 8% -53.9% 6.83 ± 46% sched_debug.cpu.cpu_load[0].stddev
96.30 ± 15% -42.1% 55.77 ± 29% sched_debug.cpu.cpu_load[1].max
14.98 ± 8% -42.5% 8.62 ± 33% sched_debug.cpu.cpu_load[1].stddev
100.56 ± 13% -50.3% 50.02 ± 27% sched_debug.cpu.cpu_load[2].max
15.02 ± 6% -47.2% 7.93 ± 33% sched_debug.cpu.cpu_load[2].stddev
105.84 ± 5% -52.3% 50.52 ± 22% sched_debug.cpu.cpu_load[3].max
15.25 ± 5% -46.4% 8.17 ± 33% sched_debug.cpu.cpu_load[3].stddev
129.32 ± 4% -32.6% 87.17 ± 23% sched_debug.cpu.cpu_load[4].max
17.43 ± 8% -33.0% 11.68 ± 26% sched_debug.cpu.cpu_load[4].stddev
211996 ± 35% -66.6% 70880 ±112% sched_debug.cpu.load.max
28998 ± 27% -63.6% 10560 ±101% sched_debug.cpu.load.stddev
576.16 ± 12% +20.3% 692.94 ± 8% sched_debug.cpu.sched_goidle.min
4098794 ± 6% -15.3% 3470270 ± 4% proc-vmstat.nr_active_anon
4074115 ± 5% -15.7% 3433000 ± 4% proc-vmstat.nr_anon_pages
4959 ± 5% -11.3% 4398 ± 2% proc-vmstat.nr_anon_transparent_hugepages
286.25 +2.0% 292.00 proc-vmstat.nr_dirtied
1176467 ± 2% +5.4% 1239660 proc-vmstat.nr_dirty_background_threshold
2355811 ± 2% +5.4% 2482352 proc-vmstat.nr_dirty_threshold
11870565 ± 2% +5.3% 12503408 proc-vmstat.nr_free_pages
483.25 +4.1% 503.00 proc-vmstat.nr_inactive_file
14412 -1.2% 14236 proc-vmstat.nr_kernel_stack
16805 -16.7% 13998 ± 2% proc-vmstat.nr_page_table_pages
22003 +1.6% 22362 proc-vmstat.nr_shmem
266.25 +5.9% 282.00 proc-vmstat.nr_written
4098789 ± 6% -15.3% 3470272 ± 4% proc-vmstat.nr_zone_active_anon
483.25 +4.1% 503.00 proc-vmstat.nr_zone_inactive_file
631493 -27.2% 459789 proc-vmstat.numa_hint_faults
430976 ± 2% -9.5% 389983 proc-vmstat.numa_hint_faults_local
11746776 ± 2% +3.3% 12130436 proc-vmstat.numa_hit
2186131 -5.8% 2059699 proc-vmstat.numa_huge_pte_updates
11730893 ± 2% +3.3% 12114589 proc-vmstat.numa_local
883475 -46.6% 471424 ± 6% proc-vmstat.numa_pages_migrated
1.121e+09 -5.8% 1.056e+09 proc-vmstat.numa_pte_updates
1.201e+09 +1.9% 1.223e+09 proc-vmstat.pgalloc_normal
1.198e+09 +1.9% 1.221e+09 proc-vmstat.pgfree
21214976 ± 5% -48.9% 10840832 ± 7% proc-vmstat.pgmigrate_fail
883475 -46.6% 471424 ± 6% proc-vmstat.pgmigrate_success
2279739 +2.9% 2344747 proc-vmstat.thp_deferred_split_page
2283726 +2.8% 2347574 proc-vmstat.thp_fault_alloc
45.43 ± 3% -45.4 0.00 perf-profile.calltrace.cycles-pp.copy_page.copy_user_huge_page.do_huge_pmd_wp_page.__handle_mm_fault.handle_mm_fault
18.79 ± 4% -2.4 16.37 ± 4% perf-profile.calltrace.cycles-pp.do_rw_once
0.80 ± 8% -0.3 0.51 ± 58% perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.do_access
0.70 ± 9% -0.3 0.45 ± 58% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt.do_access
0.70 ± 2% +0.2 0.93 ± 7% perf-profile.calltrace.cycles-pp.__alloc_pages_nodemask.do_huge_pmd_wp_page.__handle_mm_fault.handle_mm_fault.__do_page_fault
0.68 ± 3% +0.2 0.91 ± 7% perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages_nodemask.do_huge_pmd_wp_page.__handle_mm_fault.handle_mm_fault
1.42 ± 18% +0.4 1.80 ± 14% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
1.22 ± 20% +0.4 1.60 ± 16% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
1.55 ± 18% +0.4 1.95 ± 13% perf-profile.calltrace.cycles-pp.secondary_startup_64
1.53 ± 18% +0.4 1.93 ± 13% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
1.53 ± 18% +0.4 1.93 ± 13% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64
1.53 ± 18% +0.4 1.93 ± 13% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64
0.00 +2.0 2.03 ± 32% perf-profile.calltrace.cycles-pp.apic_timer_interrupt.copy_page.copy_subpage.copy_user_huge_page.do_huge_pmd_wp_page
44.59 ± 2% +5.3 49.89 ± 5% perf-profile.calltrace.cycles-pp.handle_mm_fault.__do_page_fault.do_page_fault.page_fault.do_access
44.73 ± 2% +5.3 50.03 ± 5% perf-profile.calltrace.cycles-pp.do_page_fault.page_fault.do_access
44.73 ± 2% +5.3 50.03 ± 5% perf-profile.calltrace.cycles-pp.__do_page_fault.do_page_fault.page_fault.do_access
44.76 ± 2% +5.3 50.06 ± 5% perf-profile.calltrace.cycles-pp.page_fault.do_access
46.34 ± 3% +6.3 52.62 ± 3% perf-profile.calltrace.cycles-pp.copy_user_huge_page.do_huge_pmd_wp_page.__handle_mm_fault.handle_mm_fault.__do_page_fault
48.45 ± 3% +6.6 55.03 ± 3% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
48.03 ± 3% +6.7 54.70 ± 3% perf-profile.calltrace.cycles-pp.do_huge_pmd_wp_page.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault
0.00 +51.5 51.53 ± 3% perf-profile.calltrace.cycles-pp.copy_page.copy_subpage.copy_user_huge_page.do_huge_pmd_wp_page.__handle_mm_fault
0.00 +52.5 52.48 ± 3% perf-profile.calltrace.cycles-pp.copy_subpage.copy_user_huge_page.do_huge_pmd_wp_page.__handle_mm_fault.handle_mm_fault
42.36 ± 3% -5.9 36.45 ± 4% perf-profile.children.cycles-pp.do_rw_once
2.49 ± 6% -0.4 2.10 ± 14% perf-profile.children.cycles-pp.smp_apic_timer_interrupt
2.16 ± 6% -0.4 1.80 ± 15% perf-profile.children.cycles-pp.hrtimer_interrupt
1.79 ± 6% -0.3 1.48 ± 17% perf-profile.children.cycles-pp.__hrtimer_run_queues
1.43 ± 6% -0.3 1.13 ± 22% perf-profile.children.cycles-pp.tick_sched_timer
1.30 ± 6% -0.3 1.03 ± 23% perf-profile.children.cycles-pp.tick_sched_handle
1.28 ± 6% -0.3 1.01 ± 22% perf-profile.children.cycles-pp.update_process_times
0.11 ± 12% -0.1 0.05 ± 9% perf-profile.children.cycles-pp.do_huge_pmd_numa_page
0.19 ± 8% -0.0 0.15 ± 22% perf-profile.children.cycles-pp.rcu_check_callbacks
0.13 ± 5% -0.0 0.10 ± 10% perf-profile.children.cycles-pp.ktime_get_update_offsets_now
0.39 -0.0 0.36 ± 5% perf-profile.children.cycles-pp.tlb_flush_mmu_free
0.39 -0.0 0.37 ± 4% perf-profile.children.cycles-pp.tlb_finish_mmu
0.39 -0.0 0.37 ± 4% perf-profile.children.cycles-pp.arch_tlb_finish_mmu
0.47 ± 2% -0.0 0.45 perf-profile.children.cycles-pp.exit_mmap
0.12 ± 3% -0.0 0.10 ± 4% perf-profile.children.cycles-pp.flush_tlb_mm_range
0.05 ± 8% +0.0 0.07 ± 7% perf-profile.children.cycles-pp.load_balance
0.10 ± 5% +0.0 0.11 ± 3% perf-profile.children.cycles-pp.try_to_wake_up
0.08 ± 14% +0.0 0.10 ± 5% perf-profile.children.cycles-pp.autoremove_wake_function
0.08 ± 8% +0.0 0.10 ± 7% perf-profile.children.cycles-pp.__wake_up_common
0.07 ± 10% +0.0 0.09 ± 8% perf-profile.children.cycles-pp.wake_up_page_bit
0.06 ± 9% +0.0 0.09 ± 9% perf-profile.children.cycles-pp.___might_sleep
0.00 +0.1 0.05 perf-profile.children.cycles-pp.enqueue_task_fair
0.01 ±173% +0.1 0.06 ± 13% perf-profile.children.cycles-pp.__lock_page
0.00 +0.1 0.06 ± 7% perf-profile.children.cycles-pp.clear_subpage
0.47 ± 3% +0.1 0.55 ± 3% perf-profile.children.cycles-pp.reuse_swap_page
0.19 ± 2% +0.2 0.35 ± 8% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
0.18 ± 4% +0.2 0.35 ± 9% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
0.81 ± 2% +0.3 1.07 ± 3% perf-profile.children.cycles-pp.__alloc_pages_nodemask
0.79 ± 3% +0.3 1.05 ± 3% perf-profile.children.cycles-pp.get_page_from_freelist
1.23 ± 20% +0.4 1.62 ± 16% perf-profile.children.cycles-pp.intel_idle
1.45 ± 18% +0.4 1.83 ± 14% perf-profile.children.cycles-pp.cpuidle_enter_state
1.55 ± 18% +0.4 1.95 ± 13% perf-profile.children.cycles-pp.do_idle
1.55 ± 18% +0.4 1.95 ± 13% perf-profile.children.cycles-pp.secondary_startup_64
1.55 ± 18% +0.4 1.95 ± 13% perf-profile.children.cycles-pp.cpu_startup_entry
1.53 ± 18% +0.4 1.93 ± 13% perf-profile.children.cycles-pp.start_secondary
46.30 ± 3% +6.1 52.44 ± 3% perf-profile.children.cycles-pp.copy_page
46.35 ± 3% +6.3 52.65 ± 3% perf-profile.children.cycles-pp.copy_user_huge_page
48.68 ± 3% +6.6 55.29 ± 3% perf-profile.children.cycles-pp.do_page_fault
48.68 ± 3% +6.6 55.29 ± 3% perf-profile.children.cycles-pp.__do_page_fault
48.53 ± 3% +6.6 55.14 ± 3% perf-profile.children.cycles-pp.__handle_mm_fault
48.57 ± 3% +6.6 55.19 ± 3% perf-profile.children.cycles-pp.handle_mm_fault
48.71 ± 3% +6.6 55.33 ± 3% perf-profile.children.cycles-pp.page_fault
48.04 ± 3% +6.7 54.71 ± 3% perf-profile.children.cycles-pp.do_huge_pmd_wp_page
0.00 +52.5 52.49 ± 3% perf-profile.children.cycles-pp.copy_subpage
40.53 ± 3% -5.9 34.64 ± 4% perf-profile.self.cycles-pp.do_rw_once
6.36 ± 3% -0.8 5.57 ± 5% perf-profile.self.cycles-pp.do_access
0.18 ± 8% -0.0 0.13 ± 22% perf-profile.self.cycles-pp.rcu_check_callbacks
0.11 ± 4% -0.0 0.09 ± 11% perf-profile.self.cycles-pp.ktime_get_update_offsets_now
0.07 +0.0 0.08 ± 5% perf-profile.self.cycles-pp.do_huge_pmd_wp_page
0.04 ± 58% +0.0 0.09 ± 5% perf-profile.self.cycles-pp.___might_sleep
0.46 ± 4% +0.1 0.53 ± 3% perf-profile.self.cycles-pp.reuse_swap_page
0.60 ± 3% +0.1 0.67 ± 3% perf-profile.self.cycles-pp.get_page_from_freelist
0.19 ± 2% +0.2 0.35 ± 8% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.00 +0.2 0.18 ± 6% perf-profile.self.cycles-pp.copy_subpage
1.23 ± 20% +0.4 1.61 ± 16% perf-profile.self.cycles-pp.intel_idle
45.26 ± 3% +6.2 51.48 ± 3% perf-profile.self.cycles-pp.copy_page
vm-scalability.time.user_time
12000 +-+-----------------------------------------------------------------+
| |
10000 +-+.+.+.++.+.+.+.+.+.+.+.++.+.+.+.+.+.+.+.++.+.+.+.+.+.+.+.++.+.+.+.|
| |
O O O O OO O O O O O O OO O O O O O O O OO |
8000 +-+ |
| |
6000 +-+ |
| |
4000 +-+ |
| |
| |
2000 +-+ |
| |
0 +-+------------O----------------------------------------------------+
vm-scalability.time.system_time
12000 +-+-----------------------------------------------------------------+
O O O O OO O O O O O O OO O O O O O O O OO |
10000 +-+.+.+.++.+.+.+.+.+.+.+.++.+.+.+.+.+.+.+.++.+.+.+.+.+.+.+.++.+.+.+.|
| |
| |
8000 +-+ |
| |
6000 +-+ |
| |
4000 +-+ |
| |
| |
2000 +-+ |
| |
0 +-+------------O----------------------------------------------------+
vm-scalability.throughput
2.5e+07 +-+---------------------------------------------------------------+
| |
O O O OO O O O OO O O O O OO O O O O OO O |
2e+07 +-+.+.++.+.+.+.+.++.+.+.+.+.++.+.+.+.+.++.+.+.+.+.++.+.+.+.+.++.+.|
| |
| |
1.5e+07 +-+ |
| |
1e+07 +-+ |
| |
| |
5e+06 +-+ |
| |
| |
0 +-+------------O--------------------------------------------------+
vm-scalability.median
300000 +-+----------------------------------------------------------------+
O O O OO O O O O OO O O O O O O OO O O O O |
250000 +-+.+.++.+.+.+.+.+.++.+.+.+.+.+.+.++.+.+.+.+.+.++.+.+.+.+.+.++.+.+.|
| |
| |
200000 +-+ |
| |
150000 +-+ |
| |
100000 +-+ |
| |
| |
50000 +-+ |
| |
0 +-+------------O---------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Rong Chen
View attachment "config-4.18.0-08150-gc9f4cd7" of type "text/plain" (167180 bytes)
View attachment "job-script" of type "text/plain" (7363 bytes)
View attachment "job.yaml" of type "text/plain" (4955 bytes)
View attachment "reproduce" of type "text/plain" (12825 bytes)
Powered by blists - more mailing lists