lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200821020259.GA90000@shbuild999.sh.intel.com>
Date:   Fri, 21 Aug 2020 10:02:59 +0800
From:   Feng Tang <feng.tang@...el.com>
To:     "Luck, Tony" <tony.luck@...el.com>
Cc:     Borislav Petkov <bp@...e.de>,
        kernel test robot <rong.a.chen@...el.com>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org
Subject: Re: [LKP] Re: [x86/mce] 1de08dccd3: will-it-scale.per_process_ops
 -14.1% regression

On Wed, Aug 19, 2020 at 10:04:37AM +0800, Feng Tang wrote:
> > We do have some DEFINE_PER_CPU data objects of type "struct mce":
> > 
> > $ git grep 'DEFINE_PER_CPU(struct mce,'
> > arch/x86/kernel/cpu/mce/core.c:static DEFINE_PER_CPU(struct mce, mces_seen);
> > arch/x86/kernel/cpu/mce/core.c:DEFINE_PER_CPU(struct mce, injectm);
> > 
> > Maybe making those slightly bigger has pushed some other per_cpu object
> > into an unfortunate alignment where some frequently used data is now
> > split between two cache lines instead of sitting in one?
> 
> Yes, I also checked the percpu data part of kernel System map, seems
> it only affects alignments of several variables, from 'mce_pooll_banks'
> to 'tsc_adjust', and the alignment restores for 'lapic_events', but I
> can't see any of them could be related to this malloc microbenchmark
> 	
> old map:
> 
> 	0000000000018c60 d mces_seen
> 	0000000000018ce0 D injectm
> 	0000000000018d58 D mce_poll_banks
> 	0000000000018d60 D mce_poll_count
> 	0000000000018d64 D mce_exception_count
> 	0000000000018d68 D mce_device
> 	0000000000018d70 d cmci_storm_state
> 	0000000000018d74 d cmci_storm_cnt
> 	0000000000018d78 d cmci_time_stamp
> 	0000000000018d80 d cmci_backoff_cnt
> 	0000000000018d88 d mce_banks_owned
> 	0000000000018d90 d smca_misc_banks_map
> 	0000000000018d94 d bank_map
> 	0000000000018d98 d threshold_banks
> 	0000000000018da0 d thermal_state
> 	0000000000019260 D pqr_state
> 	0000000000019270 d arch_prev_mperf
> 	0000000000019278 d arch_prev_aperf
> 	0000000000019280 D arch_freq_scale
> 	00000000000192a0 d tsc_adjust
> 	00000000000192c0 d lapic_events
> 
> new map:
> 
> 	0000000000018c60 d mces_seen
> 	0000000000018ce0 D injectm
> 	0000000000018d60 D mce_poll_banks
> 	0000000000018d68 D mce_poll_count
> 	0000000000018d6c D mce_exception_count
> 	0000000000018d70 D mce_device
> 	0000000000018d78 d cmci_storm_state
> 	0000000000018d7c d cmci_storm_cnt
> 	0000000000018d80 d cmci_time_stamp
> 	0000000000018d88 d cmci_backoff_cnt
> 	0000000000018d90 d mce_banks_owned
> 	0000000000018d98 d smca_misc_banks_map
> 	0000000000018d9c d bank_map
> 	0000000000018da0 d threshold_banks
> 	0000000000018dc0 d thermal_state
> 	0000000000019280 D pqr_state
> 	0000000000019290 d arch_prev_mperf
> 	0000000000019298 d arch_prev_aperf
> 	00000000000192a0 D arch_freq_scale
> 	00000000000192c0 d tsc_adjust
> 	0000000000019300 d lapic_events
> 
> > Can you collect some perf trace data for the benchmark when running
> > on kernels with kflags as __u32 and __u64 (looks to be the minimal
> > possible change that you found that still exhibits this problem).
> >
> > We'd like to find out which kernel functions are burning extra CPU
> > cycles and maybe understand why.

I can only found the old kernels for raw tip/ras/core branch, which reproduced
this regressions.

  1de08dccd383 x86/mce: Add a struct mce.kflags field
  9554bfe403bd x86/mce: Convert the CEC to use the MCE notifier

And strange thing is after using gcc9 and debian10 rootfs, with same commits
the regression turns to a improvement, though the trend keeps, that if we
changes the kflags from __u64 to __u32, the performance will be no change.

Following is the comparing of regression, I also attached the perf-profile
for old and new commit (let me know if you need more data)


9554bfe403bdfc08 1de08dccd383482a3e88845d355 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    192362           -15.1%     163343        will-it-scale.287.processes
      0.91            +0.2%       0.92        will-it-scale.287.processes_idle
    669.67           -15.1%     568.50        will-it-scale.per_process_ops
    309.97            +0.2%     310.74        will-it-scale.time.elapsed_time
    309.97            +0.2%     310.74        will-it-scale.time.elapsed_time.max
      0.67 ±141%    +200.0%       2.00 ± 50%  will-it-scale.time.involuntary_context_switches
      9921            +0.8%      10004        will-it-scale.time.maximum_resident_set_size
      6110            +0.3%       6130        will-it-scale.time.minor_page_faults
      4096            +0.0%       4096        will-it-scale.time.page_size
      0.18 ±  2%      +1.9%       0.18 ±  5%  will-it-scale.time.system_time
      0.25 ±  3%      +0.0%       0.25 ±  4%  will-it-scale.time.user_time
     73.00           +12.3%      82.00 ±  3%  will-it-scale.time.voluntary_context_switches
    192362           -15.1%     163343        will-it-scale.workload
    366.22            +0.3%     367.20        uptime.boot
     15417 ±  4%      +0.8%      15533        uptime.idle
 1.347e+09 ±  2%      -1.9%  1.321e+09        cpuidle.C1.time
   2623112 ±  7%      +5.7%    2773573        cpuidle.C1.usage
    532385 ± 70%     -98.7%       7012 ± 13%  cpuidle.POLL.time
     11803 ± 72%     -96.7%     392.50 ± 13%  cpuidle.POLL.usage
      1.44 ±  4%      +0.1        1.52        mpstat.cpu.all.idle%
      0.00 ± 41%      +0.0        0.00 ± 19%  mpstat.cpu.all.soft%
     98.01            -0.0       97.98        mpstat.cpu.all.sys%
      0.55 ±  3%      -0.1        0.50        mpstat.cpu.all.usr%
      0.00          -100.0%       0.00        numa-numastat.node0.interleave_hit
   1.2e+08           -14.5%  1.026e+08        numa-numastat.node0.local_node
   1.2e+08           -14.5%  1.026e+08        numa-numastat.node0.numa_hit
      0.00          -100.0%       0.00        numa-numastat.node0.other_node
      0.00          -100.0%       0.00        numa-numastat.node1.interleave_hit
      0.00          -100.0%       0.00        numa-numastat.node1.local_node
      0.00          -100.0%       0.00        numa-numastat.node1.numa_hit
      0.00          -100.0%       0.00        numa-numastat.node1.other_node
    309.97            +0.2%     310.74        time.elapsed_time
    309.97            +0.2%     310.74        time.elapsed_time.max
      0.67 ±141%    +200.0%       2.00 ± 50%  time.involuntary_context_switches
      9921            +0.8%      10004        time.maximum_resident_set_size
      6110            +0.3%       6130        time.minor_page_faults
      4096            +0.0%       4096        time.page_size
      0.18 ±  2%      +1.9%       0.18 ±  5%  time.system_time
      0.25 ±  3%      +0.0%       0.25 ±  4%  time.user_time
     73.00           +12.3%      82.00 ±  3%  time.voluntary_context_switches
      1.00           +50.0%       1.50 ± 33%  vmstat.cpu.id
     97.00            +0.0%      97.00        vmstat.cpu.sy
      0.00          -100.0%       0.00        vmstat.cpu.us
      0.00          -100.0%       0.00        vmstat.io.bi
      4.00            +0.0%       4.00        vmstat.memory.buff
   1574390            -0.1%    1573361        vmstat.memory.cache
  79173849            +0.0%   79177727        vmstat.memory.free
    282.33            -0.1%     282.00        vmstat.procs.r
      2760            -0.4%       2749        vmstat.system.cs
    364380 ± 12%      -1.4%     359417        vmstat.system.in
     10.07            -8.7%       9.20        perf-stat.i.MPKI
 1.005e+10            +1.6%  1.022e+10        perf-stat.i.branch-instructions
      1.30            -0.1        1.16        perf-stat.i.branch-miss-rate%
  1.26e+08            -9.7%  1.138e+08        perf-stat.i.branch-misses
     13.62            +0.2       13.86        perf-stat.i.cache-miss-rate%
  55442235 ±  2%      -6.1%   52078517        perf-stat.i.cache-misses
 4.077e+08 ±  2%      -7.6%  3.766e+08        perf-stat.i.cache-references
      2747            -0.3%       2739        perf-stat.i.context-switches
     10.85            -1.4%      10.70        perf-stat.i.cpi
    288378            +0.1%     288596        perf-stat.i.cpu-clock
 4.467e+11            -0.0%  4.465e+11        perf-stat.i.cpu-cycles
    267.78            +0.2%     268.24        perf-stat.i.cpu-migrations
      8033 ±  2%      +6.4%       8547        perf-stat.i.cycles-between-cache-misses
      0.18            -0.0        0.16        perf-stat.i.iTLB-load-miss-rate%
  68968473           -11.4%   61127131        perf-stat.i.iTLB-load-misses
 4.114e+10            +1.4%  4.172e+10        perf-stat.i.iTLB-loads
 4.109e+10            +1.4%  4.167e+10        perf-stat.i.instructions
    598.48           +14.8%     687.20        perf-stat.i.instructions-per-iTLB-miss
      0.09            +1.4%       0.09        perf-stat.i.ipc
      1.55            -0.1%       1.55        perf-stat.i.metric.GHz
      1.35           -15.1%       1.15        perf-stat.i.metric.K/sec
    178.94            +1.3%     181.27        perf-stat.i.metric.M/sec
    195779           -14.8%     166863        perf-stat.i.minor-faults
    195779           -14.8%     166863        perf-stat.i.page-faults
    288378            +0.1%     288596        perf-stat.i.task-clock
      9.92            -8.9%       9.04        perf-stat.overall.MPKI
      1.25            -0.1        1.11        perf-stat.overall.branch-miss-rate%
     13.66            +0.2       13.89        perf-stat.overall.cache-miss-rate%
     10.87            -1.5%      10.71        perf-stat.overall.cpi
      8026 ±  2%      +6.3%       8534        perf-stat.overall.cycles-between-cache-misses
      0.17            -0.0        0.15        perf-stat.overall.iTLB-load-miss-rate%
    596.26           +14.2%     681.07        perf-stat.overall.instructions-per-iTLB-miss
      0.09            +1.5%       0.09        perf-stat.overall.ipc
  65896092           +19.8%   78932072        perf-stat.overall.path-length
 1.002e+10            +1.6%  1.018e+10        perf-stat.ps.branch-instructions
 1.254e+08            -9.5%  1.134e+08        perf-stat.ps.branch-misses
  55492415 ±  2%      -6.1%   52128690        perf-stat.ps.cache-misses
 4.062e+08 ±  2%      -7.6%  3.754e+08        perf-stat.ps.cache-references
      2689            -0.4%       2677        perf-stat.ps.context-switches
    286795            +0.0%     286862        perf-stat.ps.cpu-clock
 4.452e+11            -0.1%  4.449e+11        perf-stat.ps.cpu-cycles
    253.81            +0.2%     254.32        perf-stat.ps.cpu-migrations
  68711344           -11.3%   60977219        perf-stat.ps.iTLB-load-misses
 4.098e+10            +1.4%  4.156e+10        perf-stat.ps.iTLB-loads
 4.096e+10            +1.4%  4.153e+10        perf-stat.ps.instructions
    194243           -14.6%     165836        perf-stat.ps.minor-faults
    194243           -14.6%     165836        perf-stat.ps.page-faults
    286795            +0.0%     286862        perf-stat.ps.task-clock
 1.268e+13            +1.7%  1.289e+13        perf-stat.total.instructions
      0.00          -100.0%       0.00        proc-vmstat.compact_isolated
    153775            +0.1%     153982        proc-vmstat.nr_active_anon
     34.00 ±  7%      -5.9%      32.00 ±  9%  proc-vmstat.nr_active_file
    111205            -0.4%     110762        proc-vmstat.nr_anon_pages
     61.00 ± 31%     +14.8%      70.00 ±  5%  proc-vmstat.nr_anon_transparent_hugepages
     58.67            -1.1%      58.00        proc-vmstat.nr_dirtied
      5.00            +0.0%       5.00        proc-vmstat.nr_dirty
   1963650            +0.0%    1963749        proc-vmstat.nr_dirty_background_threshold
   3932102            +0.0%    3932300        proc-vmstat.nr_dirty_threshold
    360190            +0.0%     360264        proc-vmstat.nr_file_pages
     49937            +0.0%      49937        proc-vmstat.nr_free_cma
  19794023            +0.0%   19795020        proc-vmstat.nr_free_pages
      5663            -0.0%       5661        proc-vmstat.nr_inactive_anon
     98.00 ±  3%      +0.0%      98.00 ±  7%  proc-vmstat.nr_inactive_file
     13.33 ± 60%     +61.2%      21.50 ±  2%  proc-vmstat.nr_isolated_anon
     40539            -0.0%      40522        proc-vmstat.nr_kernel_stack
     12404            -0.5%      12343        proc-vmstat.nr_mapped
    430.00           -49.9%     215.50 ± 99%  proc-vmstat.nr_mlock
     15352            -0.0%      15347        proc-vmstat.nr_page_table_pages
     48318            +1.3%      48928        proc-vmstat.nr_shmem
     33638            -0.6%      33432        proc-vmstat.nr_slab_reclaimable
     80590            -0.7%      80051        proc-vmstat.nr_slab_unreclaimable
    311806            -0.2%     311237        proc-vmstat.nr_unevictable
      0.00          -100.0%       0.00        proc-vmstat.nr_unstable
      0.00          -100.0%       0.00        proc-vmstat.nr_writeback
     57.67            -1.2%      57.00        proc-vmstat.nr_written
    153775            +0.1%     153982        proc-vmstat.nr_zone_active_anon
     34.00 ±  7%      -5.9%      32.00 ±  9%  proc-vmstat.nr_zone_active_file
      5663            -0.0%       5661        proc-vmstat.nr_zone_inactive_anon
     98.00 ±  3%      +0.0%      98.00 ±  7%  proc-vmstat.nr_zone_inactive_file
    311806            -0.2%     311237        proc-vmstat.nr_zone_unevictable
      5.00            +0.0%       5.00        proc-vmstat.nr_zone_write_pending
      2788 ±  8%      -1.2%       2755 ± 11%  proc-vmstat.numa_hint_faults
      2788 ±  8%      -1.2%       2755 ± 11%  proc-vmstat.numa_hint_faults_local
   1.2e+08           -14.5%  1.026e+08        proc-vmstat.numa_hit
    121.00 ± 28%     +59.9%     193.50 ± 20%  proc-vmstat.numa_huge_pte_updates
      0.00          -100.0%       0.00        proc-vmstat.numa_interleave
   1.2e+08           -14.5%  1.026e+08        proc-vmstat.numa_local
      0.00          -100.0%       0.00        proc-vmstat.numa_other
     65275 ± 26%     +56.7%     102311 ± 20%  proc-vmstat.numa_pte_updates
      6292 ±  6%      +0.7%       6335 ±  2%  proc-vmstat.pgactivate
      0.00          -100.0%       0.00        proc-vmstat.pgalloc_dma32
 1.201e+08           -14.5%  1.027e+08        proc-vmstat.pgalloc_normal
  60452926           -14.4%   51751356        proc-vmstat.pgfault
   1.2e+08           -14.5%  1.026e+08        proc-vmstat.pgfree
      0.00          -100.0%       0.00        proc-vmstat.pgpgin
     50.00 ± 52%     +18.0%      59.00 ± 10%  proc-vmstat.thp_collapse_alloc
     32.00            +0.0%      32.00        proc-vmstat.thp_fault_alloc
      0.00          -100.0%       0.00        proc-vmstat.thp_zero_page_alloc
    105.00            -0.5%     104.50        proc-vmstat.unevictable_pgs_culled
    549.00            +0.0%     549.00        proc-vmstat.unevictable_pgs_mlocked
      0.68 ± 70%      -0.7        0.00        pp.bt.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt
      0.56 ±  2%      -0.6        0.00        pp.bt.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.76            -0.1        0.65        pp.bt.mmap64
      0.68            -0.1        0.57        pp.bt.entry_SYSCALL_64_after_hwframe.mmap64
      0.64            -0.1        0.54        pp.bt.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.mmap64
      0.65            -0.1        0.55        pp.bt.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.mmap64
      0.67            -0.1        0.57        pp.bt.do_syscall_64.entry_SYSCALL_64_after_hwframe.mmap64
      0.82            -0.1        0.74 ±  2%  pp.bt.handle_mm_fault.do_page_fault.page_fault
      0.78            -0.1        0.70        pp.bt.__handle_mm_fault.handle_mm_fault.do_page_fault.page_fault
      0.70            -0.1        0.62        pp.bt.handle_pte_fault.__handle_mm_fault.handle_mm_fault.do_page_fault.page_fault
      1.03            -0.1        0.95        pp.bt.page_fault
      0.99            -0.1        0.92 ±  2%  pp.bt.do_page_fault.page_fault
      0.92            -0.1        0.86        pp.bt.unmap_vmas.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
      0.85            -0.1        0.80        pp.bt.unmap_page_range.unmap_vmas.unmap_region.__do_munmap.__vm_munmap
      1.42 ±  4%      -0.0        1.37        pp.bt.__hrtimer_run_queues.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt._raw_spin_unlock_irqrestore
      0.99 ±  6%      -0.0        0.95        pp.bt._raw_spin_unlock_irqrestore.release_pages.tlb_flush_mmu.tlb_finish_mmu.unmap_region
      0.98 ±  5%      -0.0        0.94        pp.bt.apic_timer_interrupt._raw_spin_unlock_irqrestore.release_pages.tlb_flush_mmu.tlb_finish_mmu
      0.94 ±  6%      -0.0        0.91        pp.bt.smp_apic_timer_interrupt.apic_timer_interrupt._raw_spin_unlock_irqrestore.release_pages.tlb_flush_mmu
      0.82 ±  5%      -0.0        0.79        pp.bt.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt._raw_spin_unlock_irqrestore.release_pages
      0.82 ±  5%      -0.0        0.81        pp.bt.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt._raw_spin_unlock_irqrestore.pagevec_lru_move_fn
      0.95 ±  5%      -0.0        0.94        pp.bt.smp_apic_timer_interrupt.apic_timer_interrupt._raw_spin_unlock_irqrestore.pagevec_lru_move_fn.lru_add_drain_cpu
      0.98 ±  5%      -0.0        0.97        pp.bt.apic_timer_interrupt._raw_spin_unlock_irqrestore.pagevec_lru_move_fn.lru_add_drain_cpu.lru_add_drain
      0.98 ±  5%      -0.0        0.97        pp.bt._raw_spin_unlock_irqrestore.pagevec_lru_move_fn.lru_add_drain_cpu.lru_add_drain.unmap_region
     47.85            +0.1       47.95        pp.bt.tlb_finish_mmu.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
     47.84            +0.1       47.94        pp.bt.tlb_flush_mmu.tlb_finish_mmu.unmap_region.__do_munmap.__vm_munmap
     47.71            +0.1       47.83        pp.bt.release_pages.tlb_flush_mmu.tlb_finish_mmu.unmap_region.__do_munmap
     97.48            +0.2       97.64        pp.bt.munmap
     97.35            +0.2       97.53        pp.bt.entry_SYSCALL_64_after_hwframe.munmap
     97.34            +0.2       97.52        pp.bt.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap
     46.37            +0.2       46.55        pp.bt._raw_spin_lock_irqsave.release_pages.tlb_flush_mmu.tlb_finish_mmu.unmap_region
     46.34            +0.2       46.52        pp.bt.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.release_pages.tlb_flush_mmu.tlb_finish_mmu
     96.76            +0.2       96.97        pp.bt.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
     96.81            +0.2       97.03        pp.bt.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap
     96.80            +0.2       97.02        pp.bt.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap
     47.47            +0.2       47.71        pp.bt.lru_add_drain.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
     47.46            +0.2       47.70        pp.bt.lru_add_drain_cpu.lru_add_drain.unmap_region.__do_munmap.__vm_munmap
     47.44            +0.2       47.68        pp.bt.pagevec_lru_move_fn.lru_add_drain_cpu.lru_add_drain.unmap_region.__do_munmap
     96.55            +0.2       96.80        pp.bt.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
     46.22            +0.3       46.48        pp.bt._raw_spin_lock_irqsave.pagevec_lru_move_fn.lru_add_drain_cpu.lru_add_drain.unmap_region
     46.19            +0.3       46.47        pp.bt.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.pagevec_lru_move_fn.lru_add_drain_cpu.lru_add_drain
      0.76            -0.1        0.65        pp.child.mmap64
      0.66            -0.1        0.56        pp.child.vm_mmap_pgoff
      0.67            -0.1        0.57        pp.child.ksys_mmap_pgoff
      0.58 ±  2%      -0.1        0.49        pp.child.do_mmap
      0.11 ± 37%      -0.1        0.03 ±100%  pp.child.timerqueue_del
      2.02 ±  4%      -0.1        1.94        pp.child.smp_apic_timer_interrupt
      2.11 ±  4%      -0.1        2.03        pp.child.apic_timer_interrupt
      1.51 ±  4%      -0.1        1.44        pp.child.__hrtimer_run_queues
      0.79            -0.1        0.71        pp.child.__handle_mm_fault
      0.84            -0.1        0.77        pp.child.handle_mm_fault
      1.07            -0.1        0.99        pp.child.page_fault
      1.75 ±  4%      -0.1        1.68        pp.child.hrtimer_interrupt
      0.71            -0.1        0.64        pp.child.handle_pte_fault
      0.44 ±  2%      -0.1        0.36        pp.child.mmap_region
      0.07 ± 70%      -0.1        0.00        pp.child.rb_next
      1.02            -0.1        0.95        pp.child.do_page_fault
      0.06            -0.1        0.00        pp.child.free_unref_page_commit
      0.93            -0.1        0.87        pp.child.unmap_vmas
      2.06 ±  6%      -0.1        2.00        pp.child._raw_spin_unlock_irqrestore
      0.05            -0.1        0.00        pp.child.__might_sleep
      0.05            -0.1        0.00        pp.child.find_vma
      0.51            -0.0        0.46        pp.child.exit_to_usermode_loop
      0.27            -0.0        0.22        pp.child.get_page_from_freelist
      0.32            -0.0        0.27        pp.child.__alloc_pages_nodemask
      0.87            -0.0        0.83        pp.child.unmap_page_range
      0.39 ± 29%      -0.0        0.35 ± 34%  pp.child.cmd_record
      0.39 ± 29%      -0.0        0.35 ± 34%  pp.child.perf_mmap__push
      0.36 ± 27%      -0.0        0.32 ± 34%  pp.child.ksys_write
      0.50            -0.0        0.46        pp.child.task_work_run
      0.50            -0.0        0.46        pp.child.task_numa_work
      0.18 ±  4%      -0.0        0.14        pp.child.perf_event_mmap
      0.39 ± 29%      -0.0        0.35 ± 35%  pp.child.__libc_start_main
      0.39 ± 29%      -0.0        0.35 ± 35%  pp.child.main
      0.38 ± 28%      -0.0        0.35 ± 33%  pp.child.__GI___libc_write
      0.17 ±  2%      -0.0        0.14 ±  3%  pp.child.prep_new_page
      0.35 ± 27%      -0.0        0.31 ± 35%  pp.child.vfs_write
      0.12 ±  8%      -0.0        0.08        pp.child.perf_iterate_sb
      0.50            -0.0        0.46        pp.child.change_protection
      0.50            -0.0        0.46        pp.child.change_prot_numa
      0.50            -0.0        0.46        pp.child.change_p4d_range
      0.25 ±  3%      -0.0        0.21 ±  2%  pp.child.__pte_alloc
      0.03 ± 70%      -0.0        0.00        pp.child.mem_cgroup_try_charge_delay
      0.03 ± 70%      -0.0        0.00        pp.child.__put_anon_vma
      0.15 ±  3%      -0.0        0.12        pp.child.clear_page_erms
      0.23 ±  2%      -0.0        0.20 ±  2%  pp.child.pte_alloc_one
      0.32 ± 28%      -0.0        0.29 ± 34%  pp.child.generic_file_write_iter
      0.31 ± 29%      -0.0        0.28 ± 35%  pp.child.__generic_file_write_iter
      0.30 ± 28%      -0.0        0.28 ± 34%  pp.child.generic_perform_write
      0.32 ± 28%      -0.0        0.30 ± 35%  pp.child.new_sync_write
      0.11 ±  4%      -0.0        0.09        pp.child.free_unref_page_list
      0.99 ±  4%      -0.0        0.97        pp.child.update_process_times
      1.06 ±  4%      -0.0        1.04        pp.child.tick_sched_timer
      0.16            -0.0        0.14        pp.child.alloc_pages_vma
      0.12            -0.0        0.10        pp.child.get_unmapped_area
      1.01 ±  4%      -0.0        0.99        pp.child.tick_sched_handle
      0.54 ±  3%      -0.0        0.53        pp.child.task_tick_fair
      0.78 ±  5%      -0.0        0.76        pp.child.scheduler_tick
      0.02 ±141%      -0.0        0.00        pp.child.iov_iter_fault_in_readable
      0.02 ±141%      -0.0        0.00        pp.child.enqueue_hrtimer
      0.32            -0.0        0.30        pp.child.___might_sleep
      0.17 ±  2%      -0.0        0.15        pp.child._cond_resched
      0.07 ±  7%      -0.0        0.05        pp.child.kmem_cache_free
      0.16            -0.0        0.15 ±  3%  pp.child.irq_exit
      0.13 ±  3%      -0.0        0.12        pp.child.free_pgtables
      0.12 ±  3%      -0.0        0.11        pp.child.unlink_anon_vmas
      0.11 ± 11%      -0.0        0.10        pp.child.perf_mux_hrtimer_handler
      0.08 ±  5%      -0.0        0.07        pp.child._raw_spin_lock
      0.10 ±  4%      -0.0        0.08 ±  5%  pp.child.arch_get_unmapped_area_topdown
      0.10            -0.0        0.09        pp.child.__update_load_avg_cfs_rq
      0.22 ± 28%      -0.0        0.21 ± 38%  pp.child.shmem_write_begin
      0.22 ± 28%      -0.0        0.21 ± 38%  pp.child.shmem_getpage_gfp
      0.16 ±  5%      -0.0        0.15        pp.child.update_curr
      0.13            -0.0        0.12        pp.child.__anon_vma_prepare
      0.13            -0.0        0.12        pp.child.free_pgd_range
      0.09 ±  9%      -0.0        0.08        pp.child.mem_cgroup_uncharge_list
      0.07            -0.0        0.06        pp.child.vm_unmapped_area
      0.07            -0.0        0.06        pp.child.percpu_counter_add_batch
      0.12            -0.0        0.11        pp.child.free_p4d_range
      0.09 ±  5%      -0.0        0.08        pp.child.kmem_cache_alloc
      0.06 ±  8%      -0.0        0.05        pp.child.rcu_sched_clock_irq
      0.06 ±  8%      -0.0        0.05        pp.child.remove_vma
      0.07            -0.0        0.07 ±  7%  pp.child.rcu_all_qs
      0.06            -0.0        0.06 ±  9%  pp.child.run_timer_softirq
      0.06            -0.0        0.06 ±  9%  pp.child.malloc
      0.07 ±  6%      -0.0        0.07        pp.child.flush_tlb_mm_range
      0.05 ±  8%      -0.0        0.05        pp.child.entry_SYSCALL_64
      0.05 ±  8%      -0.0        0.05        pp.child.vma_link
      0.06 ±  8%      -0.0        0.06 ±  9%  pp.child.clockevents_program_event
      0.05            -0.0        0.05        pp.child.vm_normal_page
      0.06            +0.0        0.06        pp.child.syscall_return_via_sysret
      0.06            +0.0        0.06        pp.child.uncharge_batch
      0.11 ±  4%      +0.0        0.11 ±  4%  pp.child.__softirqentry_text_start
      0.09 ±  5%      +0.0        0.10 ±  5%  pp.child.__update_load_avg_se
      0.16 ± 26%      +0.0        0.16 ± 27%  pp.child.__lru_cache_add
      0.11 ±  4%      +0.0        0.12        pp.child.__pagevec_lru_add_fn
      0.02 ±141%      +0.0        0.03 ±100%  pp.child.interrupt_entry
      0.07 ± 18%      +0.0        0.08        pp.child.update_rq_clock
      0.11 ±  4%      +0.0        0.12 ±  4%  pp.child.__perf_sw_event
      0.09 ±  5%      +0.0        0.11 ±  4%  pp.child.___perf_sw_event
      0.04 ± 70%      +0.0        0.06        pp.child.perf_event_task_tick
      0.13 ± 31%      +0.0        0.15 ±  3%  pp.child.__remove_hrtimer
      0.07            +0.0        0.11 ±  4%  pp.child.__mod_lruvec_state
      0.00            +0.1        0.05        pp.child.irq_enter
      0.00            +0.1        0.05 ±100%  pp.child.isolate_lru_page
     98.48            +0.1       98.53        pp.child.do_syscall_64
      0.00            +0.1        0.06 ±  9%  pp.child.mmput
      0.00            +0.1        0.06 ±  9%  pp.child.exit_mmap
     98.50            +0.1       98.56        pp.child.entry_SYSCALL_64_after_hwframe
      0.02 ±141%      +0.1        0.08 ±  6%  pp.child.__mod_memcg_state
      0.00            +0.1        0.06 ±100%  pp.child.khugepaged
      0.00            +0.1        0.06 ±100%  pp.child._raw_spin_lock_irq
      0.00            +0.1        0.07 ±100%  pp.child.ret_from_fork
      0.00            +0.1        0.07 ±100%  pp.child.kthread
     47.88            +0.1       47.98        pp.child.tlb_finish_mmu
     47.87            +0.1       47.98        pp.child.tlb_flush_mmu
     47.79            +0.1       47.90        pp.child.release_pages
     97.50            +0.2       97.66        pp.child.munmap
     96.82            +0.2       97.03        pp.child.__x64_sys_munmap
     96.81            +0.2       97.03        pp.child.__vm_munmap
     96.79            +0.2       97.01        pp.child.__do_munmap
     47.66            +0.2       47.91        pp.child.pagevec_lru_move_fn
     47.52            +0.2       47.77        pp.child.lru_add_drain
     47.51            +0.3       47.76        pp.child.lru_add_drain_cpu
     96.58            +0.3       96.83        pp.child.unmap_region
     92.87            +0.5       93.34        pp.child._raw_spin_lock_irqsave
     92.80            +0.5       93.31        pp.child.native_queued_spin_lock_slowpath
      0.15 ± 28%      -0.1        0.07        pp.self.__hrtimer_run_queues
      0.07 ± 70%      -0.1        0.00        pp.self.rb_next
      0.05            -0.1        0.00        pp.self.__pagevec_lru_add_fn
      0.05            -0.1        0.00        pp.self.run_timer_softirq
      0.05            -0.1        0.00        pp.self.free_unref_page_commit
      0.15 ±  3%      -0.0        0.11        pp.self.clear_page_erms
      0.44            -0.0        0.41        pp.self.change_p4d_range
      0.08 ±  5%      -0.0        0.06 ±  9%  pp.self.perf_iterate_sb
      0.47 ±  2%      -0.0        0.45        pp.self.unmap_page_range
      0.02 ±141%      -0.0        0.00        pp.self.smp_apic_timer_interrupt
      0.30            -0.0        0.29        pp.self.___might_sleep
      0.02 ±141%      -0.0        0.00        pp.self.malloc
      0.02 ±141%      -0.0        0.00        pp.self.entry_SYSCALL_64
      0.02 ±141%      -0.0        0.00        pp.self.__might_sleep
      0.08 ± 10%      -0.0        0.07 ±  7%  pp.self._raw_spin_lock
      0.09 ± 14%      -0.0        0.08 ±  6%  pp.self.hrtimer_interrupt
      0.09 ±  5%      -0.0        0.08 ±  6%  pp.self.release_pages
      0.10 ±  8%      -0.0        0.09        pp.self._raw_spin_unlock_irqrestore
      0.09            -0.0        0.08        pp.self.__update_load_avg_cfs_rq
      0.06            -0.0        0.05        pp.self.do_page_fault
      0.06            -0.0        0.05        pp.self.kmem_cache_free
      0.09 ±  5%      -0.0        0.08 ±  5%  pp.self.free_p4d_range
      0.08 ±  5%      -0.0        0.08 ±  6%  pp.self._cond_resched
      0.07 ±  7%      -0.0        0.06        pp.self.vm_unmapped_area
      0.06 ±  8%      -0.0        0.05        pp.self.kmem_cache_alloc
      0.06            -0.0        0.06 ±  9%  pp.self.rcu_all_qs
      0.08            -0.0        0.08 ±  6%  pp.self.__update_load_avg_se
      0.09 ±  5%      -0.0        0.09        pp.self.update_curr
      0.07 ±  6%      -0.0        0.07        pp.self._raw_spin_lock_irqsave
      0.11 ±  4%      -0.0        0.11        pp.self.task_tick_fair
      0.07 ±  7%      -0.0        0.07 ±  7%  pp.self.get_page_from_freelist
      0.06 ±  8%      -0.0        0.06 ±  9%  pp.self.__handle_mm_fault
      0.05            -0.0        0.05        pp.self.__do_munmap
      0.06            +0.0        0.06        pp.self.syscall_return_via_sysret
      0.02 ±141%      +0.0        0.03 ±100%  pp.self.update_rq_clock
      0.02 ±141%      +0.0        0.03 ±100%  pp.self.interrupt_entry
      0.04 ± 70%      +0.0        0.06 ±  9%  pp.self.perf_event_task_tick
      0.00            +0.0        0.03 ±100%  pp.self.vm_normal_page
      0.05            +0.0        0.08        pp.self.___perf_sw_event
      0.02 ±141%      +0.1        0.08 ±  6%  pp.self.__mod_memcg_state
      0.00            +0.1        0.11 ±  4%  pp.self.__remove_hrtimer
     92.80            +0.5       93.31        pp.self.native_queued_spin_lock_slowpath
    333.33            -0.2%     332.50        softirqs.BLOCK
      5.00            +0.0%       5.00        softirqs.HI
     17005 ± 69%     -64.3%       6074 ±  8%  softirqs.NET_RX
     45.33            +0.4%      45.50 ±  3%  softirqs.NET_TX
   1322815 ±  2%      -1.3%    1305414        softirqs.RCU
    633707 ±  9%      +1.9%     645991 ± 11%  softirqs.SCHED
    293.00            -0.2%     292.50        softirqs.TASKLET
  35621870           +12.5%   40074312        softirqs.TIMER
    344034            -0.3%     343007        interrupts.CAL:Function_call_interrupts
    396.00            -0.1%     395.50        interrupts.IWI:IRQ_work_interrupts
 1.102e+08 ± 13%      -1.1%   1.09e+08        interrupts.LOC:Local_timer_interrupts
    288.00            +0.0%     288.00        interrupts.MCP:Machine_check_polls
   1451499            +1.0%    1465843        interrupts.NMI:Non-maskable_interrupts
   1451499            +1.0%    1465843        interrupts.PMI:Performance_monitoring_interrupts
     24121 ±  2%      +1.9%      24578 ±  7%  interrupts.RES:Rescheduling_interrupts
      1262 ±  2%     +12.5%       1421 ±  7%  interrupts.TLB:TLB_shootdowns

Thanks,
Feng


Download attachment "perf-profile.old" of type "application/x-trash" (143788 bytes)

View attachment "perf-profile.new" of type "text/plain" (157202 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ