[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200821020259.GA90000@shbuild999.sh.intel.com>
Date: Fri, 21 Aug 2020 10:02:59 +0800
From: Feng Tang <feng.tang@...el.com>
To: "Luck, Tony" <tony.luck@...el.com>
Cc: Borislav Petkov <bp@...e.de>,
kernel test robot <rong.a.chen@...el.com>,
LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org
Subject: Re: [LKP] Re: [x86/mce] 1de08dccd3: will-it-scale.per_process_ops
-14.1% regression
On Wed, Aug 19, 2020 at 10:04:37AM +0800, Feng Tang wrote:
> > We do have some DEFINE_PER_CPU data objects of type "struct mce":
> >
> > $ git grep 'DEFINE_PER_CPU(struct mce,'
> > arch/x86/kernel/cpu/mce/core.c:static DEFINE_PER_CPU(struct mce, mces_seen);
> > arch/x86/kernel/cpu/mce/core.c:DEFINE_PER_CPU(struct mce, injectm);
> >
> > Maybe making those slightly bigger has pushed some other per_cpu object
> > into an unfortunate alignment where some frequently used data is now
> > split between two cache lines instead of sitting in one?
>
> Yes, I also checked the percpu data part of kernel System map, seems
> it only affects alignments of several variables, from 'mce_pooll_banks'
> to 'tsc_adjust', and the alignment restores for 'lapic_events', but I
> can't see any of them could be related to this malloc microbenchmark
>
> old map:
>
> 0000000000018c60 d mces_seen
> 0000000000018ce0 D injectm
> 0000000000018d58 D mce_poll_banks
> 0000000000018d60 D mce_poll_count
> 0000000000018d64 D mce_exception_count
> 0000000000018d68 D mce_device
> 0000000000018d70 d cmci_storm_state
> 0000000000018d74 d cmci_storm_cnt
> 0000000000018d78 d cmci_time_stamp
> 0000000000018d80 d cmci_backoff_cnt
> 0000000000018d88 d mce_banks_owned
> 0000000000018d90 d smca_misc_banks_map
> 0000000000018d94 d bank_map
> 0000000000018d98 d threshold_banks
> 0000000000018da0 d thermal_state
> 0000000000019260 D pqr_state
> 0000000000019270 d arch_prev_mperf
> 0000000000019278 d arch_prev_aperf
> 0000000000019280 D arch_freq_scale
> 00000000000192a0 d tsc_adjust
> 00000000000192c0 d lapic_events
>
> new map:
>
> 0000000000018c60 d mces_seen
> 0000000000018ce0 D injectm
> 0000000000018d60 D mce_poll_banks
> 0000000000018d68 D mce_poll_count
> 0000000000018d6c D mce_exception_count
> 0000000000018d70 D mce_device
> 0000000000018d78 d cmci_storm_state
> 0000000000018d7c d cmci_storm_cnt
> 0000000000018d80 d cmci_time_stamp
> 0000000000018d88 d cmci_backoff_cnt
> 0000000000018d90 d mce_banks_owned
> 0000000000018d98 d smca_misc_banks_map
> 0000000000018d9c d bank_map
> 0000000000018da0 d threshold_banks
> 0000000000018dc0 d thermal_state
> 0000000000019280 D pqr_state
> 0000000000019290 d arch_prev_mperf
> 0000000000019298 d arch_prev_aperf
> 00000000000192a0 D arch_freq_scale
> 00000000000192c0 d tsc_adjust
> 0000000000019300 d lapic_events
>
> > Can you collect some perf trace data for the benchmark when running
> > on kernels with kflags as __u32 and __u64 (looks to be the minimal
> > possible change that you found that still exhibits this problem).
> >
> > We'd like to find out which kernel functions are burning extra CPU
> > cycles and maybe understand why.
I can only found the old kernels for raw tip/ras/core branch, which reproduced
this regressions.
1de08dccd383 x86/mce: Add a struct mce.kflags field
9554bfe403bd x86/mce: Convert the CEC to use the MCE notifier
And strange thing is after using gcc9 and debian10 rootfs, with same commits
the regression turns to a improvement, though the trend keeps, that if we
changes the kflags from __u64 to __u32, the performance will be no change.
Following is the comparing of regression, I also attached the perf-profile
for old and new commit (let me know if you need more data)
9554bfe403bdfc08 1de08dccd383482a3e88845d355
---------------- ---------------------------
%stddev %change %stddev
\ | \
192362 -15.1% 163343 will-it-scale.287.processes
0.91 +0.2% 0.92 will-it-scale.287.processes_idle
669.67 -15.1% 568.50 will-it-scale.per_process_ops
309.97 +0.2% 310.74 will-it-scale.time.elapsed_time
309.97 +0.2% 310.74 will-it-scale.time.elapsed_time.max
0.67 ±141% +200.0% 2.00 ± 50% will-it-scale.time.involuntary_context_switches
9921 +0.8% 10004 will-it-scale.time.maximum_resident_set_size
6110 +0.3% 6130 will-it-scale.time.minor_page_faults
4096 +0.0% 4096 will-it-scale.time.page_size
0.18 ± 2% +1.9% 0.18 ± 5% will-it-scale.time.system_time
0.25 ± 3% +0.0% 0.25 ± 4% will-it-scale.time.user_time
73.00 +12.3% 82.00 ± 3% will-it-scale.time.voluntary_context_switches
192362 -15.1% 163343 will-it-scale.workload
366.22 +0.3% 367.20 uptime.boot
15417 ± 4% +0.8% 15533 uptime.idle
1.347e+09 ± 2% -1.9% 1.321e+09 cpuidle.C1.time
2623112 ± 7% +5.7% 2773573 cpuidle.C1.usage
532385 ± 70% -98.7% 7012 ± 13% cpuidle.POLL.time
11803 ± 72% -96.7% 392.50 ± 13% cpuidle.POLL.usage
1.44 ± 4% +0.1 1.52 mpstat.cpu.all.idle%
0.00 ± 41% +0.0 0.00 ± 19% mpstat.cpu.all.soft%
98.01 -0.0 97.98 mpstat.cpu.all.sys%
0.55 ± 3% -0.1 0.50 mpstat.cpu.all.usr%
0.00 -100.0% 0.00 numa-numastat.node0.interleave_hit
1.2e+08 -14.5% 1.026e+08 numa-numastat.node0.local_node
1.2e+08 -14.5% 1.026e+08 numa-numastat.node0.numa_hit
0.00 -100.0% 0.00 numa-numastat.node0.other_node
0.00 -100.0% 0.00 numa-numastat.node1.interleave_hit
0.00 -100.0% 0.00 numa-numastat.node1.local_node
0.00 -100.0% 0.00 numa-numastat.node1.numa_hit
0.00 -100.0% 0.00 numa-numastat.node1.other_node
309.97 +0.2% 310.74 time.elapsed_time
309.97 +0.2% 310.74 time.elapsed_time.max
0.67 ±141% +200.0% 2.00 ± 50% time.involuntary_context_switches
9921 +0.8% 10004 time.maximum_resident_set_size
6110 +0.3% 6130 time.minor_page_faults
4096 +0.0% 4096 time.page_size
0.18 ± 2% +1.9% 0.18 ± 5% time.system_time
0.25 ± 3% +0.0% 0.25 ± 4% time.user_time
73.00 +12.3% 82.00 ± 3% time.voluntary_context_switches
1.00 +50.0% 1.50 ± 33% vmstat.cpu.id
97.00 +0.0% 97.00 vmstat.cpu.sy
0.00 -100.0% 0.00 vmstat.cpu.us
0.00 -100.0% 0.00 vmstat.io.bi
4.00 +0.0% 4.00 vmstat.memory.buff
1574390 -0.1% 1573361 vmstat.memory.cache
79173849 +0.0% 79177727 vmstat.memory.free
282.33 -0.1% 282.00 vmstat.procs.r
2760 -0.4% 2749 vmstat.system.cs
364380 ± 12% -1.4% 359417 vmstat.system.in
10.07 -8.7% 9.20 perf-stat.i.MPKI
1.005e+10 +1.6% 1.022e+10 perf-stat.i.branch-instructions
1.30 -0.1 1.16 perf-stat.i.branch-miss-rate%
1.26e+08 -9.7% 1.138e+08 perf-stat.i.branch-misses
13.62 +0.2 13.86 perf-stat.i.cache-miss-rate%
55442235 ± 2% -6.1% 52078517 perf-stat.i.cache-misses
4.077e+08 ± 2% -7.6% 3.766e+08 perf-stat.i.cache-references
2747 -0.3% 2739 perf-stat.i.context-switches
10.85 -1.4% 10.70 perf-stat.i.cpi
288378 +0.1% 288596 perf-stat.i.cpu-clock
4.467e+11 -0.0% 4.465e+11 perf-stat.i.cpu-cycles
267.78 +0.2% 268.24 perf-stat.i.cpu-migrations
8033 ± 2% +6.4% 8547 perf-stat.i.cycles-between-cache-misses
0.18 -0.0 0.16 perf-stat.i.iTLB-load-miss-rate%
68968473 -11.4% 61127131 perf-stat.i.iTLB-load-misses
4.114e+10 +1.4% 4.172e+10 perf-stat.i.iTLB-loads
4.109e+10 +1.4% 4.167e+10 perf-stat.i.instructions
598.48 +14.8% 687.20 perf-stat.i.instructions-per-iTLB-miss
0.09 +1.4% 0.09 perf-stat.i.ipc
1.55 -0.1% 1.55 perf-stat.i.metric.GHz
1.35 -15.1% 1.15 perf-stat.i.metric.K/sec
178.94 +1.3% 181.27 perf-stat.i.metric.M/sec
195779 -14.8% 166863 perf-stat.i.minor-faults
195779 -14.8% 166863 perf-stat.i.page-faults
288378 +0.1% 288596 perf-stat.i.task-clock
9.92 -8.9% 9.04 perf-stat.overall.MPKI
1.25 -0.1 1.11 perf-stat.overall.branch-miss-rate%
13.66 +0.2 13.89 perf-stat.overall.cache-miss-rate%
10.87 -1.5% 10.71 perf-stat.overall.cpi
8026 ± 2% +6.3% 8534 perf-stat.overall.cycles-between-cache-misses
0.17 -0.0 0.15 perf-stat.overall.iTLB-load-miss-rate%
596.26 +14.2% 681.07 perf-stat.overall.instructions-per-iTLB-miss
0.09 +1.5% 0.09 perf-stat.overall.ipc
65896092 +19.8% 78932072 perf-stat.overall.path-length
1.002e+10 +1.6% 1.018e+10 perf-stat.ps.branch-instructions
1.254e+08 -9.5% 1.134e+08 perf-stat.ps.branch-misses
55492415 ± 2% -6.1% 52128690 perf-stat.ps.cache-misses
4.062e+08 ± 2% -7.6% 3.754e+08 perf-stat.ps.cache-references
2689 -0.4% 2677 perf-stat.ps.context-switches
286795 +0.0% 286862 perf-stat.ps.cpu-clock
4.452e+11 -0.1% 4.449e+11 perf-stat.ps.cpu-cycles
253.81 +0.2% 254.32 perf-stat.ps.cpu-migrations
68711344 -11.3% 60977219 perf-stat.ps.iTLB-load-misses
4.098e+10 +1.4% 4.156e+10 perf-stat.ps.iTLB-loads
4.096e+10 +1.4% 4.153e+10 perf-stat.ps.instructions
194243 -14.6% 165836 perf-stat.ps.minor-faults
194243 -14.6% 165836 perf-stat.ps.page-faults
286795 +0.0% 286862 perf-stat.ps.task-clock
1.268e+13 +1.7% 1.289e+13 perf-stat.total.instructions
0.00 -100.0% 0.00 proc-vmstat.compact_isolated
153775 +0.1% 153982 proc-vmstat.nr_active_anon
34.00 ± 7% -5.9% 32.00 ± 9% proc-vmstat.nr_active_file
111205 -0.4% 110762 proc-vmstat.nr_anon_pages
61.00 ± 31% +14.8% 70.00 ± 5% proc-vmstat.nr_anon_transparent_hugepages
58.67 -1.1% 58.00 proc-vmstat.nr_dirtied
5.00 +0.0% 5.00 proc-vmstat.nr_dirty
1963650 +0.0% 1963749 proc-vmstat.nr_dirty_background_threshold
3932102 +0.0% 3932300 proc-vmstat.nr_dirty_threshold
360190 +0.0% 360264 proc-vmstat.nr_file_pages
49937 +0.0% 49937 proc-vmstat.nr_free_cma
19794023 +0.0% 19795020 proc-vmstat.nr_free_pages
5663 -0.0% 5661 proc-vmstat.nr_inactive_anon
98.00 ± 3% +0.0% 98.00 ± 7% proc-vmstat.nr_inactive_file
13.33 ± 60% +61.2% 21.50 ± 2% proc-vmstat.nr_isolated_anon
40539 -0.0% 40522 proc-vmstat.nr_kernel_stack
12404 -0.5% 12343 proc-vmstat.nr_mapped
430.00 -49.9% 215.50 ± 99% proc-vmstat.nr_mlock
15352 -0.0% 15347 proc-vmstat.nr_page_table_pages
48318 +1.3% 48928 proc-vmstat.nr_shmem
33638 -0.6% 33432 proc-vmstat.nr_slab_reclaimable
80590 -0.7% 80051 proc-vmstat.nr_slab_unreclaimable
311806 -0.2% 311237 proc-vmstat.nr_unevictable
0.00 -100.0% 0.00 proc-vmstat.nr_unstable
0.00 -100.0% 0.00 proc-vmstat.nr_writeback
57.67 -1.2% 57.00 proc-vmstat.nr_written
153775 +0.1% 153982 proc-vmstat.nr_zone_active_anon
34.00 ± 7% -5.9% 32.00 ± 9% proc-vmstat.nr_zone_active_file
5663 -0.0% 5661 proc-vmstat.nr_zone_inactive_anon
98.00 ± 3% +0.0% 98.00 ± 7% proc-vmstat.nr_zone_inactive_file
311806 -0.2% 311237 proc-vmstat.nr_zone_unevictable
5.00 +0.0% 5.00 proc-vmstat.nr_zone_write_pending
2788 ± 8% -1.2% 2755 ± 11% proc-vmstat.numa_hint_faults
2788 ± 8% -1.2% 2755 ± 11% proc-vmstat.numa_hint_faults_local
1.2e+08 -14.5% 1.026e+08 proc-vmstat.numa_hit
121.00 ± 28% +59.9% 193.50 ± 20% proc-vmstat.numa_huge_pte_updates
0.00 -100.0% 0.00 proc-vmstat.numa_interleave
1.2e+08 -14.5% 1.026e+08 proc-vmstat.numa_local
0.00 -100.0% 0.00 proc-vmstat.numa_other
65275 ± 26% +56.7% 102311 ± 20% proc-vmstat.numa_pte_updates
6292 ± 6% +0.7% 6335 ± 2% proc-vmstat.pgactivate
0.00 -100.0% 0.00 proc-vmstat.pgalloc_dma32
1.201e+08 -14.5% 1.027e+08 proc-vmstat.pgalloc_normal
60452926 -14.4% 51751356 proc-vmstat.pgfault
1.2e+08 -14.5% 1.026e+08 proc-vmstat.pgfree
0.00 -100.0% 0.00 proc-vmstat.pgpgin
50.00 ± 52% +18.0% 59.00 ± 10% proc-vmstat.thp_collapse_alloc
32.00 +0.0% 32.00 proc-vmstat.thp_fault_alloc
0.00 -100.0% 0.00 proc-vmstat.thp_zero_page_alloc
105.00 -0.5% 104.50 proc-vmstat.unevictable_pgs_culled
549.00 +0.0% 549.00 proc-vmstat.unevictable_pgs_mlocked
0.68 ± 70% -0.7 0.00 pp.bt.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt
0.56 ± 2% -0.6 0.00 pp.bt.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.76 -0.1 0.65 pp.bt.mmap64
0.68 -0.1 0.57 pp.bt.entry_SYSCALL_64_after_hwframe.mmap64
0.64 -0.1 0.54 pp.bt.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.mmap64
0.65 -0.1 0.55 pp.bt.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.mmap64
0.67 -0.1 0.57 pp.bt.do_syscall_64.entry_SYSCALL_64_after_hwframe.mmap64
0.82 -0.1 0.74 ± 2% pp.bt.handle_mm_fault.do_page_fault.page_fault
0.78 -0.1 0.70 pp.bt.__handle_mm_fault.handle_mm_fault.do_page_fault.page_fault
0.70 -0.1 0.62 pp.bt.handle_pte_fault.__handle_mm_fault.handle_mm_fault.do_page_fault.page_fault
1.03 -0.1 0.95 pp.bt.page_fault
0.99 -0.1 0.92 ± 2% pp.bt.do_page_fault.page_fault
0.92 -0.1 0.86 pp.bt.unmap_vmas.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
0.85 -0.1 0.80 pp.bt.unmap_page_range.unmap_vmas.unmap_region.__do_munmap.__vm_munmap
1.42 ± 4% -0.0 1.37 pp.bt.__hrtimer_run_queues.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt._raw_spin_unlock_irqrestore
0.99 ± 6% -0.0 0.95 pp.bt._raw_spin_unlock_irqrestore.release_pages.tlb_flush_mmu.tlb_finish_mmu.unmap_region
0.98 ± 5% -0.0 0.94 pp.bt.apic_timer_interrupt._raw_spin_unlock_irqrestore.release_pages.tlb_flush_mmu.tlb_finish_mmu
0.94 ± 6% -0.0 0.91 pp.bt.smp_apic_timer_interrupt.apic_timer_interrupt._raw_spin_unlock_irqrestore.release_pages.tlb_flush_mmu
0.82 ± 5% -0.0 0.79 pp.bt.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt._raw_spin_unlock_irqrestore.release_pages
0.82 ± 5% -0.0 0.81 pp.bt.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt._raw_spin_unlock_irqrestore.pagevec_lru_move_fn
0.95 ± 5% -0.0 0.94 pp.bt.smp_apic_timer_interrupt.apic_timer_interrupt._raw_spin_unlock_irqrestore.pagevec_lru_move_fn.lru_add_drain_cpu
0.98 ± 5% -0.0 0.97 pp.bt.apic_timer_interrupt._raw_spin_unlock_irqrestore.pagevec_lru_move_fn.lru_add_drain_cpu.lru_add_drain
0.98 ± 5% -0.0 0.97 pp.bt._raw_spin_unlock_irqrestore.pagevec_lru_move_fn.lru_add_drain_cpu.lru_add_drain.unmap_region
47.85 +0.1 47.95 pp.bt.tlb_finish_mmu.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
47.84 +0.1 47.94 pp.bt.tlb_flush_mmu.tlb_finish_mmu.unmap_region.__do_munmap.__vm_munmap
47.71 +0.1 47.83 pp.bt.release_pages.tlb_flush_mmu.tlb_finish_mmu.unmap_region.__do_munmap
97.48 +0.2 97.64 pp.bt.munmap
97.35 +0.2 97.53 pp.bt.entry_SYSCALL_64_after_hwframe.munmap
97.34 +0.2 97.52 pp.bt.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap
46.37 +0.2 46.55 pp.bt._raw_spin_lock_irqsave.release_pages.tlb_flush_mmu.tlb_finish_mmu.unmap_region
46.34 +0.2 46.52 pp.bt.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.release_pages.tlb_flush_mmu.tlb_finish_mmu
96.76 +0.2 96.97 pp.bt.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
96.81 +0.2 97.03 pp.bt.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap
96.80 +0.2 97.02 pp.bt.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap
47.47 +0.2 47.71 pp.bt.lru_add_drain.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
47.46 +0.2 47.70 pp.bt.lru_add_drain_cpu.lru_add_drain.unmap_region.__do_munmap.__vm_munmap
47.44 +0.2 47.68 pp.bt.pagevec_lru_move_fn.lru_add_drain_cpu.lru_add_drain.unmap_region.__do_munmap
96.55 +0.2 96.80 pp.bt.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
46.22 +0.3 46.48 pp.bt._raw_spin_lock_irqsave.pagevec_lru_move_fn.lru_add_drain_cpu.lru_add_drain.unmap_region
46.19 +0.3 46.47 pp.bt.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.pagevec_lru_move_fn.lru_add_drain_cpu.lru_add_drain
0.76 -0.1 0.65 pp.child.mmap64
0.66 -0.1 0.56 pp.child.vm_mmap_pgoff
0.67 -0.1 0.57 pp.child.ksys_mmap_pgoff
0.58 ± 2% -0.1 0.49 pp.child.do_mmap
0.11 ± 37% -0.1 0.03 ±100% pp.child.timerqueue_del
2.02 ± 4% -0.1 1.94 pp.child.smp_apic_timer_interrupt
2.11 ± 4% -0.1 2.03 pp.child.apic_timer_interrupt
1.51 ± 4% -0.1 1.44 pp.child.__hrtimer_run_queues
0.79 -0.1 0.71 pp.child.__handle_mm_fault
0.84 -0.1 0.77 pp.child.handle_mm_fault
1.07 -0.1 0.99 pp.child.page_fault
1.75 ± 4% -0.1 1.68 pp.child.hrtimer_interrupt
0.71 -0.1 0.64 pp.child.handle_pte_fault
0.44 ± 2% -0.1 0.36 pp.child.mmap_region
0.07 ± 70% -0.1 0.00 pp.child.rb_next
1.02 -0.1 0.95 pp.child.do_page_fault
0.06 -0.1 0.00 pp.child.free_unref_page_commit
0.93 -0.1 0.87 pp.child.unmap_vmas
2.06 ± 6% -0.1 2.00 pp.child._raw_spin_unlock_irqrestore
0.05 -0.1 0.00 pp.child.__might_sleep
0.05 -0.1 0.00 pp.child.find_vma
0.51 -0.0 0.46 pp.child.exit_to_usermode_loop
0.27 -0.0 0.22 pp.child.get_page_from_freelist
0.32 -0.0 0.27 pp.child.__alloc_pages_nodemask
0.87 -0.0 0.83 pp.child.unmap_page_range
0.39 ± 29% -0.0 0.35 ± 34% pp.child.cmd_record
0.39 ± 29% -0.0 0.35 ± 34% pp.child.perf_mmap__push
0.36 ± 27% -0.0 0.32 ± 34% pp.child.ksys_write
0.50 -0.0 0.46 pp.child.task_work_run
0.50 -0.0 0.46 pp.child.task_numa_work
0.18 ± 4% -0.0 0.14 pp.child.perf_event_mmap
0.39 ± 29% -0.0 0.35 ± 35% pp.child.__libc_start_main
0.39 ± 29% -0.0 0.35 ± 35% pp.child.main
0.38 ± 28% -0.0 0.35 ± 33% pp.child.__GI___libc_write
0.17 ± 2% -0.0 0.14 ± 3% pp.child.prep_new_page
0.35 ± 27% -0.0 0.31 ± 35% pp.child.vfs_write
0.12 ± 8% -0.0 0.08 pp.child.perf_iterate_sb
0.50 -0.0 0.46 pp.child.change_protection
0.50 -0.0 0.46 pp.child.change_prot_numa
0.50 -0.0 0.46 pp.child.change_p4d_range
0.25 ± 3% -0.0 0.21 ± 2% pp.child.__pte_alloc
0.03 ± 70% -0.0 0.00 pp.child.mem_cgroup_try_charge_delay
0.03 ± 70% -0.0 0.00 pp.child.__put_anon_vma
0.15 ± 3% -0.0 0.12 pp.child.clear_page_erms
0.23 ± 2% -0.0 0.20 ± 2% pp.child.pte_alloc_one
0.32 ± 28% -0.0 0.29 ± 34% pp.child.generic_file_write_iter
0.31 ± 29% -0.0 0.28 ± 35% pp.child.__generic_file_write_iter
0.30 ± 28% -0.0 0.28 ± 34% pp.child.generic_perform_write
0.32 ± 28% -0.0 0.30 ± 35% pp.child.new_sync_write
0.11 ± 4% -0.0 0.09 pp.child.free_unref_page_list
0.99 ± 4% -0.0 0.97 pp.child.update_process_times
1.06 ± 4% -0.0 1.04 pp.child.tick_sched_timer
0.16 -0.0 0.14 pp.child.alloc_pages_vma
0.12 -0.0 0.10 pp.child.get_unmapped_area
1.01 ± 4% -0.0 0.99 pp.child.tick_sched_handle
0.54 ± 3% -0.0 0.53 pp.child.task_tick_fair
0.78 ± 5% -0.0 0.76 pp.child.scheduler_tick
0.02 ±141% -0.0 0.00 pp.child.iov_iter_fault_in_readable
0.02 ±141% -0.0 0.00 pp.child.enqueue_hrtimer
0.32 -0.0 0.30 pp.child.___might_sleep
0.17 ± 2% -0.0 0.15 pp.child._cond_resched
0.07 ± 7% -0.0 0.05 pp.child.kmem_cache_free
0.16 -0.0 0.15 ± 3% pp.child.irq_exit
0.13 ± 3% -0.0 0.12 pp.child.free_pgtables
0.12 ± 3% -0.0 0.11 pp.child.unlink_anon_vmas
0.11 ± 11% -0.0 0.10 pp.child.perf_mux_hrtimer_handler
0.08 ± 5% -0.0 0.07 pp.child._raw_spin_lock
0.10 ± 4% -0.0 0.08 ± 5% pp.child.arch_get_unmapped_area_topdown
0.10 -0.0 0.09 pp.child.__update_load_avg_cfs_rq
0.22 ± 28% -0.0 0.21 ± 38% pp.child.shmem_write_begin
0.22 ± 28% -0.0 0.21 ± 38% pp.child.shmem_getpage_gfp
0.16 ± 5% -0.0 0.15 pp.child.update_curr
0.13 -0.0 0.12 pp.child.__anon_vma_prepare
0.13 -0.0 0.12 pp.child.free_pgd_range
0.09 ± 9% -0.0 0.08 pp.child.mem_cgroup_uncharge_list
0.07 -0.0 0.06 pp.child.vm_unmapped_area
0.07 -0.0 0.06 pp.child.percpu_counter_add_batch
0.12 -0.0 0.11 pp.child.free_p4d_range
0.09 ± 5% -0.0 0.08 pp.child.kmem_cache_alloc
0.06 ± 8% -0.0 0.05 pp.child.rcu_sched_clock_irq
0.06 ± 8% -0.0 0.05 pp.child.remove_vma
0.07 -0.0 0.07 ± 7% pp.child.rcu_all_qs
0.06 -0.0 0.06 ± 9% pp.child.run_timer_softirq
0.06 -0.0 0.06 ± 9% pp.child.malloc
0.07 ± 6% -0.0 0.07 pp.child.flush_tlb_mm_range
0.05 ± 8% -0.0 0.05 pp.child.entry_SYSCALL_64
0.05 ± 8% -0.0 0.05 pp.child.vma_link
0.06 ± 8% -0.0 0.06 ± 9% pp.child.clockevents_program_event
0.05 -0.0 0.05 pp.child.vm_normal_page
0.06 +0.0 0.06 pp.child.syscall_return_via_sysret
0.06 +0.0 0.06 pp.child.uncharge_batch
0.11 ± 4% +0.0 0.11 ± 4% pp.child.__softirqentry_text_start
0.09 ± 5% +0.0 0.10 ± 5% pp.child.__update_load_avg_se
0.16 ± 26% +0.0 0.16 ± 27% pp.child.__lru_cache_add
0.11 ± 4% +0.0 0.12 pp.child.__pagevec_lru_add_fn
0.02 ±141% +0.0 0.03 ±100% pp.child.interrupt_entry
0.07 ± 18% +0.0 0.08 pp.child.update_rq_clock
0.11 ± 4% +0.0 0.12 ± 4% pp.child.__perf_sw_event
0.09 ± 5% +0.0 0.11 ± 4% pp.child.___perf_sw_event
0.04 ± 70% +0.0 0.06 pp.child.perf_event_task_tick
0.13 ± 31% +0.0 0.15 ± 3% pp.child.__remove_hrtimer
0.07 +0.0 0.11 ± 4% pp.child.__mod_lruvec_state
0.00 +0.1 0.05 pp.child.irq_enter
0.00 +0.1 0.05 ±100% pp.child.isolate_lru_page
98.48 +0.1 98.53 pp.child.do_syscall_64
0.00 +0.1 0.06 ± 9% pp.child.mmput
0.00 +0.1 0.06 ± 9% pp.child.exit_mmap
98.50 +0.1 98.56 pp.child.entry_SYSCALL_64_after_hwframe
0.02 ±141% +0.1 0.08 ± 6% pp.child.__mod_memcg_state
0.00 +0.1 0.06 ±100% pp.child.khugepaged
0.00 +0.1 0.06 ±100% pp.child._raw_spin_lock_irq
0.00 +0.1 0.07 ±100% pp.child.ret_from_fork
0.00 +0.1 0.07 ±100% pp.child.kthread
47.88 +0.1 47.98 pp.child.tlb_finish_mmu
47.87 +0.1 47.98 pp.child.tlb_flush_mmu
47.79 +0.1 47.90 pp.child.release_pages
97.50 +0.2 97.66 pp.child.munmap
96.82 +0.2 97.03 pp.child.__x64_sys_munmap
96.81 +0.2 97.03 pp.child.__vm_munmap
96.79 +0.2 97.01 pp.child.__do_munmap
47.66 +0.2 47.91 pp.child.pagevec_lru_move_fn
47.52 +0.2 47.77 pp.child.lru_add_drain
47.51 +0.3 47.76 pp.child.lru_add_drain_cpu
96.58 +0.3 96.83 pp.child.unmap_region
92.87 +0.5 93.34 pp.child._raw_spin_lock_irqsave
92.80 +0.5 93.31 pp.child.native_queued_spin_lock_slowpath
0.15 ± 28% -0.1 0.07 pp.self.__hrtimer_run_queues
0.07 ± 70% -0.1 0.00 pp.self.rb_next
0.05 -0.1 0.00 pp.self.__pagevec_lru_add_fn
0.05 -0.1 0.00 pp.self.run_timer_softirq
0.05 -0.1 0.00 pp.self.free_unref_page_commit
0.15 ± 3% -0.0 0.11 pp.self.clear_page_erms
0.44 -0.0 0.41 pp.self.change_p4d_range
0.08 ± 5% -0.0 0.06 ± 9% pp.self.perf_iterate_sb
0.47 ± 2% -0.0 0.45 pp.self.unmap_page_range
0.02 ±141% -0.0 0.00 pp.self.smp_apic_timer_interrupt
0.30 -0.0 0.29 pp.self.___might_sleep
0.02 ±141% -0.0 0.00 pp.self.malloc
0.02 ±141% -0.0 0.00 pp.self.entry_SYSCALL_64
0.02 ±141% -0.0 0.00 pp.self.__might_sleep
0.08 ± 10% -0.0 0.07 ± 7% pp.self._raw_spin_lock
0.09 ± 14% -0.0 0.08 ± 6% pp.self.hrtimer_interrupt
0.09 ± 5% -0.0 0.08 ± 6% pp.self.release_pages
0.10 ± 8% -0.0 0.09 pp.self._raw_spin_unlock_irqrestore
0.09 -0.0 0.08 pp.self.__update_load_avg_cfs_rq
0.06 -0.0 0.05 pp.self.do_page_fault
0.06 -0.0 0.05 pp.self.kmem_cache_free
0.09 ± 5% -0.0 0.08 ± 5% pp.self.free_p4d_range
0.08 ± 5% -0.0 0.08 ± 6% pp.self._cond_resched
0.07 ± 7% -0.0 0.06 pp.self.vm_unmapped_area
0.06 ± 8% -0.0 0.05 pp.self.kmem_cache_alloc
0.06 -0.0 0.06 ± 9% pp.self.rcu_all_qs
0.08 -0.0 0.08 ± 6% pp.self.__update_load_avg_se
0.09 ± 5% -0.0 0.09 pp.self.update_curr
0.07 ± 6% -0.0 0.07 pp.self._raw_spin_lock_irqsave
0.11 ± 4% -0.0 0.11 pp.self.task_tick_fair
0.07 ± 7% -0.0 0.07 ± 7% pp.self.get_page_from_freelist
0.06 ± 8% -0.0 0.06 ± 9% pp.self.__handle_mm_fault
0.05 -0.0 0.05 pp.self.__do_munmap
0.06 +0.0 0.06 pp.self.syscall_return_via_sysret
0.02 ±141% +0.0 0.03 ±100% pp.self.update_rq_clock
0.02 ±141% +0.0 0.03 ±100% pp.self.interrupt_entry
0.04 ± 70% +0.0 0.06 ± 9% pp.self.perf_event_task_tick
0.00 +0.0 0.03 ±100% pp.self.vm_normal_page
0.05 +0.0 0.08 pp.self.___perf_sw_event
0.02 ±141% +0.1 0.08 ± 6% pp.self.__mod_memcg_state
0.00 +0.1 0.11 ± 4% pp.self.__remove_hrtimer
92.80 +0.5 93.31 pp.self.native_queued_spin_lock_slowpath
333.33 -0.2% 332.50 softirqs.BLOCK
5.00 +0.0% 5.00 softirqs.HI
17005 ± 69% -64.3% 6074 ± 8% softirqs.NET_RX
45.33 +0.4% 45.50 ± 3% softirqs.NET_TX
1322815 ± 2% -1.3% 1305414 softirqs.RCU
633707 ± 9% +1.9% 645991 ± 11% softirqs.SCHED
293.00 -0.2% 292.50 softirqs.TASKLET
35621870 +12.5% 40074312 softirqs.TIMER
344034 -0.3% 343007 interrupts.CAL:Function_call_interrupts
396.00 -0.1% 395.50 interrupts.IWI:IRQ_work_interrupts
1.102e+08 ± 13% -1.1% 1.09e+08 interrupts.LOC:Local_timer_interrupts
288.00 +0.0% 288.00 interrupts.MCP:Machine_check_polls
1451499 +1.0% 1465843 interrupts.NMI:Non-maskable_interrupts
1451499 +1.0% 1465843 interrupts.PMI:Performance_monitoring_interrupts
24121 ± 2% +1.9% 24578 ± 7% interrupts.RES:Rescheduling_interrupts
1262 ± 2% +12.5% 1421 ± 7% interrupts.TLB:TLB_shootdowns
Thanks,
Feng
Download attachment "perf-profile.old" of type "application/x-trash" (143788 bytes)
View attachment "perf-profile.new" of type "text/plain" (157202 bytes)
Powered by blists - more mailing lists