lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 7 Jan 2021 13:33:14 +0800 From: kernel test robot <oliver.sang@...el.com> To: Daniel Jordan <daniel.m.jordan@...cle.com> Cc: Linus Torvalds <torvalds@...ux-foundation.org>, Andrew Morton <akpm@...ux-foundation.org>, Josh Triplett <josh@...htriplett.org>, Alexander Duyck <alexander.h.duyck@...ux.intel.com>, Alex Williamson <alex.williamson@...hat.com>, Dan Williams <dan.j.williams@...el.com>, Dave Hansen <dave.hansen@...ux.intel.com>, David Hildenbrand <david@...hat.com>, Herbert Xu <herbert@...dor.apana.org.au>, Jason Gunthorpe <jgg@...pe.ca>, Jonathan Corbet <corbet@....net>, Kirill Tkhai <ktkhai@...tuozzo.com>, Michal Hocko <mhocko@...nel.org>, Pavel Machek <pavel@....cz>, Pavel Tatashin <pasha.tatashin@...een.com>, Peter Zijlstra <peterz@...radead.org>, Randy Dunlap <rdunlap@...radead.org>, Robert Elliott <elliott@....com>, Shile Zhang <shile.zhang@...ux.alibaba.com>, Steffen Klassert <steffen.klassert@...unet.com>, Steven Sistare <steven.sistare@...cle.com>, Tejun Heo <tj@...nel.org>, Zi Yan <ziy@...dia.com>, LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org, lkp@...el.com, ying.huang@...el.com, feng.tang@...el.com, zhengjun.xing@...el.com Subject: [mm] e44431498f: will-it-scale.per_thread_ops 26.8% improvement Greeting, FYI, we noticed a 26.8% improvement of will-it-scale.per_thread_ops due to commit: commit: e44431498f5fbf427f139aa413cf381b4fa3a600 ("mm: parallelize deferred_init_memmap()") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: will-it-scale on test machine: 48 threads Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 112G memory with following parameters: nr_task: 16 mode: thread test: page_fault1 cpufreq_governor: performance ucode: 0x42e test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. test-url: https://github.com/antonblanchard/will-it-scale Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode: gcc-9/performance/x86_64-rhel-8.3/thread/16/debian-10.4-x86_64-20200603.cgz/lkp-ivb-2ep1/page_fault1/will-it-scale/0x42e commit: 89c7c4022d ("mm: don't track number of pages during deferred initialization") e44431498f ("mm: parallelize deferred_init_memmap()") 89c7c4022dfccf0c e44431498f5fbf427f139aa413c ---------------- --------------------------- %stddev %change %stddev \ | \ 2605430 +26.8% 3303797 will-it-scale.16.threads 69.29 +1.5% 70.34 will-it-scale.16.threads_idle 162839 +26.8% 206486 will-it-scale.per_thread_ops 2605430 +26.8% 3303797 will-it-scale.workload 6627328 ± 6% +154.3% 16854016 ± 5% meminfo.DirectMap2M 3740 ± 91% +114.4% 8017 ± 40% numa-meminfo.node1.Inactive 3740 ± 91% +114.4% 8017 ± 40% numa-meminfo.node1.Inactive(anon) 12157 ± 5% -9.5% 11005 ± 3% slabinfo.pde_opener.active_objs 12157 ± 5% -9.5% 11005 ± 3% slabinfo.pde_opener.num_objs 34869 ± 21% -44.3% 19424 softirqs.CPU23.SCHED 143557 ± 8% -12.2% 125985 ± 4% softirqs.CPU38.TIMER 69.00 +1.4% 70.00 vmstat.cpu.id 5105 +21.3% 6193 vmstat.system.cs 1750980 +13.8% 1993438 ± 2% numa-numastat.node0.local_node 1762715 +14.5% 2019056 ± 2% numa-numastat.node0.numa_hit 2017488 +28.9% 2599574 ± 2% numa-numastat.node1.local_node 2049532 +27.7% 2617670 ± 2% numa-numastat.node1.numa_hit 18100 +18.5% 21447 sched_debug.cpu.nr_switches.avg 34161 ± 9% +27.4% 43526 ± 7% sched_debug.cpu.nr_switches.max 9041 ± 5% +32.4% 11972 ± 3% sched_debug.cpu.nr_switches.stddev -48.79 -49.1% -24.83 sched_debug.cpu.nr_uninterruptible.min 366.25 ± 6% +10.8% 405.75 ± 7% numa-vmstat.node0.nr_page_table_pages 1313119 +18.4% 1554083 ± 4% numa-vmstat.node0.numa_hit 1297715 ± 2% +12.2% 1455868 ± 8% numa-vmstat.node0.numa_local 15404 ± 67% +537.6% 98214 ± 68% numa-vmstat.node0.numa_other 944.00 ± 89% +111.4% 1995 ± 41% numa-vmstat.node1.nr_inactive_anon 944.00 ± 89% +111.4% 1995 ± 41% numa-vmstat.node1.nr_zone_inactive_anon 1534775 +10.6% 1696916 ± 4% numa-vmstat.node1.numa_hit 1346117 ± 2% +18.2% 1591200 ± 8% numa-vmstat.node1.numa_local 188658 ± 5% -44.0% 105714 ± 64% numa-vmstat.node1.numa_other 329075 +1.6% 334289 proc-vmstat.nr_active_anon 311360 +1.7% 316562 proc-vmstat.nr_anon_pages 569.00 +1.6% 578.25 proc-vmstat.nr_anon_transparent_hugepages 11307 +4.5% 11810 proc-vmstat.nr_kernel_stack 1326 +3.9% 1378 proc-vmstat.nr_page_table_pages 329076 +1.6% 334288 proc-vmstat.nr_zone_active_anon 5011 ± 60% -92.4% 379.25 ± 31% proc-vmstat.numa_hint_faults_local 3840470 +21.3% 4657420 proc-vmstat.numa_hit 3796678 +21.5% 4613690 proc-vmstat.numa_local 19719 ± 17% -50.1% 9834 ± 51% proc-vmstat.numa_pages_migrated 7.862e+08 +26.8% 9.965e+08 proc-vmstat.pgalloc_normal 2444034 +16.4% 2844032 proc-vmstat.pgfault 7.861e+08 +26.7% 9.963e+08 proc-vmstat.pgfree 19719 ± 17% -50.1% 9834 ± 51% proc-vmstat.pgmigrate_success 1530844 +26.8% 1940789 proc-vmstat.thp_fault_alloc 0.03 ± 4% +79.2% 0.05 ± 40% perf-sched.sch_delay.avg.ms.__sched_text_start.__sched_text_start.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.02 ± 16% +55.6% 0.03 ± 44% perf-sched.sch_delay.avg.ms.__sched_text_start.__sched_text_start.do_wait.kernel_wait4.__do_sys_wait4 0.01 ± 44% +97.9% 0.02 ± 32% perf-sched.sch_delay.avg.ms.__sched_text_start.__sched_text_start.preempt_schedule_common._cond_resched.remove_vma 0.12 ± 12% -26.7% 0.09 ± 17% perf-sched.sch_delay.max.ms.__sched_text_start.__sched_text_start.devkmsg_read.vfs_read.ksys_read 0.07 ± 11% -33.2% 0.04 ± 19% perf-sched.sch_delay.max.ms.__sched_text_start.__sched_text_start.do_nanosleep.hrtimer_nanosleep.__x64_sys_nanosleep 7.29 ± 67% -98.4% 0.12 ± 19% perf-sched.sch_delay.max.ms.__sched_text_start.__sched_text_start.schedule_timeout.rcu_gp_kthread.kthread 46.54 ± 3% -16.5% 38.84 ± 6% perf-sched.total_wait_and_delay.average.ms 25143 +21.8% 30624 ± 5% perf-sched.total_wait_and_delay.count.ms 46.52 ± 3% -16.5% 38.82 ± 6% perf-sched.total_wait_time.average.ms 2.88 ± 6% -13.4% 2.49 ± 11% perf-sched.wait_and_delay.avg.ms.__sched_text_start.__sched_text_start.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff 299.00 ± 15% -20.1% 239.00 ± 8% perf-sched.wait_and_delay.count.__sched_text_start.__sched_text_start.preempt_schedule_common._cond_resched.clear_huge_page 17537 +26.9% 22259 ± 3% perf-sched.wait_and_delay.count.__sched_text_start.__sched_text_start.rwsem_down_read_slowpath.do_user_addr_fault.page_fault 916.25 +39.9% 1282 ± 2% perf-sched.wait_and_delay.count.__sched_text_start.__sched_text_start.rwsem_down_write_slowpath.down_write_killable.__vm_munmap 657.75 ± 3% +32.3% 870.00 ± 2% perf-sched.wait_and_delay.count.__sched_text_start.__sched_text_start.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff 2.85 ± 6% -13.4% 2.47 ± 11% perf-sched.wait_time.avg.ms.__sched_text_start.__sched_text_start.rwsem_down_write_slowpath.down_write_killable.vm_mmap_pgoff 0.01 ± 95% +124.5% 0.03 ± 32% perf-sched.wait_time.max.ms.__sched_text_start.__sched_text_start.io_schedule.__lock_page_killable.filemap_fault 0.70 ± 3% +0.2 0.92 ± 17% perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.__irqentry_text_start.cpuidle_enter_state.cpuidle_enter.do_idle 0.74 ± 3% +0.2 0.98 ± 18% perf-profile.calltrace.cycles-pp.__irqentry_text_start.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry 0.07 ± 10% +0.0 0.09 ± 5% perf-profile.children.cycles-pp.__unwind_start 0.07 ± 5% +0.0 0.09 ± 9% perf-profile.children.cycles-pp.dequeue_task_fair 0.18 ± 8% +0.0 0.21 ± 6% perf-profile.children.cycles-pp.perf_callchain_kernel 0.04 ± 58% +0.0 0.07 ± 7% perf-profile.children.cycles-pp.try_to_wake_up 0.06 ± 11% +0.0 0.09 ± 17% perf-profile.children.cycles-pp.get_next_timer_interrupt 0.06 +0.0 0.09 ± 7% perf-profile.children.cycles-pp.schedule_idle 0.09 ± 9% +0.0 0.12 ± 17% perf-profile.children.cycles-pp.irq_enter 0.07 ± 10% +0.0 0.10 ± 14% perf-profile.children.cycles-pp.perf_trace_sched_switch 0.10 ± 4% +0.0 0.14 ± 18% perf-profile.children.cycles-pp.serial8250_console_putchar 0.11 +0.0 0.15 ± 11% perf-profile.children.cycles-pp.irq_work_run_list 0.10 ± 4% +0.0 0.14 ± 14% perf-profile.children.cycles-pp.uart_console_write 0.11 ± 4% +0.0 0.14 ± 10% perf-profile.children.cycles-pp.irq_work_run 0.11 ± 4% +0.0 0.15 ± 11% perf-profile.children.cycles-pp.serial8250_console_write 0.08 ± 10% +0.0 0.12 ± 19% perf-profile.children.cycles-pp.io_serial_in 0.10 ± 4% +0.0 0.14 ± 10% perf-profile.children.cycles-pp.irq_work_interrupt 0.10 ± 4% +0.0 0.14 ± 10% perf-profile.children.cycles-pp.smp_irq_work_interrupt 0.10 ± 4% +0.0 0.14 ± 10% perf-profile.children.cycles-pp.printk 0.10 +0.0 0.14 ± 15% perf-profile.children.cycles-pp.wait_for_xmitr 0.11 ± 4% +0.0 0.15 ± 14% perf-profile.children.cycles-pp.vprintk_emit 0.11 ± 4% +0.0 0.15 ± 14% perf-profile.children.cycles-pp.console_unlock 0.14 ± 10% +0.0 0.18 ± 8% perf-profile.children.cycles-pp.schedule 0.03 ±100% +0.0 0.07 ± 17% perf-profile.children.cycles-pp.tick_irq_enter 0.25 ± 10% +0.0 0.29 ± 6% perf-profile.children.cycles-pp.get_perf_callchain 0.25 ± 10% +0.0 0.30 ± 5% perf-profile.children.cycles-pp.perf_callchain 0.01 ±173% +0.1 0.06 ± 17% perf-profile.children.cycles-pp.__next_timer_interrupt 0.28 ± 7% +0.1 0.33 ± 5% perf-profile.children.cycles-pp.perf_prepare_sample 0.20 ± 8% +0.1 0.28 ± 6% perf-profile.children.cycles-pp.__sched_text_start 0.37 ± 11% +0.1 0.44 ± 5% perf-profile.children.cycles-pp.perf_swevent_overflow 0.36 ± 10% +0.1 0.44 ± 4% perf-profile.children.cycles-pp.__perf_event_overflow 0.39 ± 11% +0.1 0.48 ± 5% perf-profile.children.cycles-pp.perf_tp_event 0.26 ± 17% +0.1 0.36 ± 30% perf-profile.children.cycles-pp.menu_select 0.78 ± 7% +0.1 0.90 ± 8% perf-profile.children.cycles-pp.__hrtimer_run_queues 0.94 ± 6% +0.2 1.10 ± 11% perf-profile.children.cycles-pp.hrtimer_interrupt 1.30 ± 4% +0.3 1.56 ± 11% perf-profile.children.cycles-pp.smp_apic_timer_interrupt 0.07 ± 11% +0.1 0.13 ± 36% perf-profile.self.cycles-pp.menu_select 394290 ± 2% +19.8% 472478 ± 3% interrupts.CAL:Function_call_interrupts 9087 ± 6% +30.3% 11840 ± 16% interrupts.CPU0.CAL:Function_call_interrupts 1570 ± 30% -53.2% 735.50 ± 20% interrupts.CPU0.RES:Rescheduling_interrupts 9202 ± 8% +29.7% 11938 ± 17% interrupts.CPU0.TLB:TLB_shootdowns 6488 ± 9% -41.9% 3766 ± 28% interrupts.CPU1.NMI:Non-maskable_interrupts 6488 ± 9% -41.9% 3766 ± 28% interrupts.CPU1.PMI:Performance_monitoring_interrupts 12965 ± 6% +34.2% 17405 ± 11% interrupts.CPU10.CAL:Function_call_interrupts 13003 ± 5% +34.0% 17426 ± 12% interrupts.CPU10.TLB:TLB_shootdowns 12766 ± 6% +28.3% 16376 ± 10% interrupts.CPU11.CAL:Function_call_interrupts 12665 ± 7% +29.5% 16402 ± 11% interrupts.CPU11.TLB:TLB_shootdowns 4156 ± 22% -39.6% 2509 ± 42% interrupts.CPU12.NMI:Non-maskable_interrupts 4156 ± 22% -39.6% 2509 ± 42% interrupts.CPU12.PMI:Performance_monitoring_interrupts 551.75 ± 22% +80.5% 996.00 ± 30% interrupts.CPU13.RES:Rescheduling_interrupts 12383 ± 3% +11.9% 13851 ± 5% interrupts.CPU14.TLB:TLB_shootdowns 15287 ± 4% +15.6% 17677 ± 11% interrupts.CPU24.CAL:Function_call_interrupts 15250 ± 5% +16.3% 17736 ± 11% interrupts.CPU24.TLB:TLB_shootdowns 325.00 ± 10% +36.5% 443.75 ± 19% interrupts.CPU25.RES:Rescheduling_interrupts 10439 ± 6% +34.0% 13993 ± 12% interrupts.CPU26.CAL:Function_call_interrupts 10126 ± 6% +36.3% 13799 ± 12% interrupts.CPU26.TLB:TLB_shootdowns 330.50 ± 5% +32.5% 437.75 ± 23% interrupts.CPU27.RES:Rescheduling_interrupts 10580 ± 4% +21.2% 12825 ± 14% interrupts.CPU30.CAL:Function_call_interrupts 10471 ± 5% +21.0% 12668 ± 15% interrupts.CPU30.TLB:TLB_shootdowns 4329 ± 11% -30.0% 3030 ± 24% interrupts.CPU36.NMI:Non-maskable_interrupts 4329 ± 11% -30.0% 3030 ± 24% interrupts.CPU36.PMI:Performance_monitoring_interrupts 10463 ± 6% +25.5% 13131 ± 11% interrupts.CPU38.CAL:Function_call_interrupts 4078 ± 18% -36.1% 2604 ± 19% interrupts.CPU38.NMI:Non-maskable_interrupts 4078 ± 18% -36.1% 2604 ± 19% interrupts.CPU38.PMI:Performance_monitoring_interrupts 461.50 ± 12% +42.3% 656.75 ± 9% interrupts.CPU38.RES:Rescheduling_interrupts 11084 ± 6% +31.7% 14592 ± 12% interrupts.CPU38.TLB:TLB_shootdowns 14158 ± 4% +27.9% 18104 ± 13% interrupts.CPU4.CAL:Function_call_interrupts 14231 ± 4% +27.8% 18184 ± 14% interrupts.CPU4.TLB:TLB_shootdowns 498.25 ± 41% -45.3% 272.75 ± 31% interrupts.CPU41.NMI:Non-maskable_interrupts 498.25 ± 41% -45.3% 272.75 ± 31% interrupts.CPU41.PMI:Performance_monitoring_interrupts 185.50 ± 38% +178.2% 516.00 ± 31% interrupts.CPU42.RES:Rescheduling_interrupts 13001 ± 6% +25.6% 16329 ± 8% interrupts.CPU5.CAL:Function_call_interrupts 12881 ± 7% +26.8% 16329 ± 9% interrupts.CPU5.TLB:TLB_shootdowns 13824 ± 10% +22.2% 16893 ± 6% interrupts.CPU7.CAL:Function_call_interrupts 13794 ± 11% +23.6% 17043 ± 6% interrupts.CPU7.TLB:TLB_shootdowns 13213 ± 6% +29.8% 17146 ± 6% interrupts.CPU8.CAL:Function_call_interrupts 4926 ± 30% -42.7% 2822 ± 19% interrupts.CPU8.NMI:Non-maskable_interrupts 4926 ± 30% -42.7% 2822 ± 19% interrupts.CPU8.PMI:Performance_monitoring_interrupts 13215 ± 7% +32.2% 17471 ± 5% interrupts.CPU8.TLB:TLB_shootdowns 12624 ± 7% +30.6% 16489 ± 19% interrupts.CPU9.CAL:Function_call_interrupts 12439 ± 7% +32.3% 16462 ± 20% interrupts.CPU9.TLB:TLB_shootdowns 153249 ± 4% -14.8% 130602 ± 7% interrupts.NMI:Non-maskable_interrupts 153249 ± 4% -14.8% 130602 ± 7% interrupts.PMI:Performance_monitoring_interrupts 386515 ± 2% +21.7% 470438 ± 3% interrupts.TLB:TLB_shootdowns 188.73 +12.3% 211.95 perf-stat.i.MPKI 2.738e+08 +10.4% 3.023e+08 perf-stat.i.branch-instructions 1.707e+08 +26.5% 2.159e+08 perf-stat.i.cache-misses 1.926e+08 +25.1% 2.41e+08 perf-stat.i.cache-references 5106 +21.5% 6202 perf-stat.i.context-switches 42.38 -13.1% 36.81 perf-stat.i.cpi 4.347e+10 -3.3% 4.204e+10 perf-stat.i.cpu-cycles 256.07 -23.3% 196.33 perf-stat.i.cycles-between-cache-misses 3.19e+08 ± 2% +9.8% 3.503e+08 perf-stat.i.dTLB-loads 9.048e+08 +22.5% 1.108e+09 perf-stat.i.dTLB-stores 1.231e+09 +9.6% 1.349e+09 perf-stat.i.instructions 1255 +7.0% 1342 ± 2% perf-stat.i.instructions-per-iTLB-miss 0.03 +12.7% 0.03 ± 2% perf-stat.i.ipc 0.91 -3.3% 0.88 perf-stat.i.metric.GHz 0.44 +17.6% 0.52 perf-stat.i.metric.K/sec 38.15 +19.4% 45.56 perf-stat.i.metric.M/sec 7976 +16.8% 9318 perf-stat.i.minor-faults 5.36 ± 5% -0.7 4.62 ± 3% perf-stat.i.node-load-miss-rate% 242591 ± 5% +10.8% 268791 ± 4% perf-stat.i.node-load-misses 4250703 +30.2% 5534656 perf-stat.i.node-loads 1.90 ± 3% -0.2 1.71 perf-stat.i.node-store-miss-rate% 2488338 ± 2% +19.1% 2963150 perf-stat.i.node-store-misses 1.339e+08 +31.6% 1.762e+08 perf-stat.i.node-stores 7976 +16.8% 9318 perf-stat.i.page-faults 156.77 +14.0% 178.74 perf-stat.overall.MPKI 35.39 -11.9% 31.19 perf-stat.overall.cpi 254.66 -23.5% 194.76 perf-stat.overall.cycles-between-cache-misses 1259 +7.3% 1351 ± 2% perf-stat.overall.instructions-per-iTLB-miss 0.03 +13.5% 0.03 perf-stat.overall.ipc 5.39 ± 4% -0.8 4.63 ± 3% perf-stat.overall.node-load-miss-rate% 1.82 ± 2% -0.2 1.65 perf-stat.overall.node-store-miss-rate% 142317 -13.6% 122910 perf-stat.overall.path-length 2.725e+08 +10.5% 3.011e+08 perf-stat.ps.branch-instructions 1.701e+08 +26.4% 2.151e+08 perf-stat.ps.cache-misses 1.919e+08 +25.1% 2.401e+08 perf-stat.ps.cache-references 5089 +21.5% 6180 perf-stat.ps.context-switches 4.332e+10 -3.3% 4.19e+10 perf-stat.ps.cpu-cycles 3.175e+08 ± 2% +9.9% 3.489e+08 perf-stat.ps.dTLB-loads 9.016e+08 +22.5% 1.105e+09 perf-stat.ps.dTLB-stores 1.224e+09 +9.7% 1.344e+09 perf-stat.ps.instructions 7946 +16.8% 9284 perf-stat.ps.minor-faults 241532 ± 4% +10.9% 267854 ± 4% perf-stat.ps.node-load-misses 4235623 +30.2% 5515080 perf-stat.ps.node-loads 2480187 ± 2% +19.1% 2953222 perf-stat.ps.node-store-misses 1.334e+08 +31.6% 1.756e+08 perf-stat.ps.node-stores 7946 +16.8% 9284 perf-stat.ps.page-faults 3.708e+11 +9.5% 4.061e+11 perf-stat.total.instructions will-it-scale.per_thread_ops 220000 +------------------------------------------------------------------+ | O O | 210000 |-O O O O O O | | O O O OO | | O O O | 200000 |-+ O O | | | 190000 |-+ O O O | | O O O O O | 180000 |-+ | | | | | 170000 |-+ | |. .+.++. .+.+. +. .+.+. .+.++.+. .+.+.+. .+. .+. .++.+.+.| 160000 +------------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, Oliver Sang View attachment "config-5.7.0-03884-ge44431498f5fbf4" of type "text/plain" (157793 bytes) View attachment "job-script" of type "text/plain" (7863 bytes) View attachment "job.yaml" of type "text/plain" (5466 bytes) View attachment "reproduce" of type "text/plain" (342 bytes)
Powered by blists - more mailing lists