lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220312154321.GC1189@xsang-OptiPlex-9020>
Date:   Sat, 12 Mar 2022 23:43:21 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     0day robot <lkp@...el.com>, Eric Dumazet <edumazet@...gle.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Michal Hocko <mhocko@...nel.org>,
        Shakeel Butt <shakeelb@...gle.com>,
        Wei Xu <weixugc@...gle.com>, Greg Thelen <gthelen@...gle.com>,
        Hugh Dickins <hughd@...gle.com>,
        David Rientjes <rientjes@...gle.com>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
        ying.huang@...el.com, feng.tang@...el.com,
        zhengjun.xing@...ux.intel.com, fengwei.yin@...el.com,
        Eric Dumazet <eric.dumazet@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        linux-mm <linux-mm@...ck.org>
Subject: [mm/page_alloc]  8212a964ee:  vm-scalability.throughput 30.5%
 improvement



Greeting,

FYI, we noticed a 30.5% improvement of vm-scalability.throughput due to commit:


commit: 8212a964ee020471104e34dce7029dec33c218a9 ("Re: [PATCH v2] mm/page_alloc: call check_new_pages() while zone spinlock is not held")
url: https://github.com/0day-ci/linux/commits/Mel-Gorman/Re-PATCH-v2-mm-page_alloc-call-check_new_pages-while-zone-spinlock-is-not-held/20220309-203504
patch link: https://lore.kernel.org/lkml/20220309123245.GI15701@techsingularity.net

in testcase: vm-scalability
on test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz with 128G memory
with following parameters:

	runtime: 300s
	size: 512G
	test: anon-w-rand-hugetlb
	cpufreq_governor: performance
	ucode: 0xd000331

test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/





Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        sudo bin/lkp install job.yaml           # job file is attached in this email
        bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
        sudo bin/lkp run generated-yaml-file

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase/ucode:
  gcc-9/performance/x86_64-rhel-8.3/debian-10.4-x86_64-20200603.cgz/300s/512G/lkp-icl-2sp5/anon-w-rand-hugetlb/vm-scalability/0xd000331

commit: 
  v5.17-rc7
  8212a964ee ("mm/page_alloc: call check_new_pages() while zone spinlock is not held")

       v5.17-rc7 8212a964ee020471104e34dce70 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      0.00 ±  5%      -7.4%       0.00 ±  4%  vm-scalability.free_time
     47190 ±  2%     +25.5%      59208 ±  2%  vm-scalability.median
   6352467 ±  2%     +30.5%    8293110 ±  2%  vm-scalability.throughput
    218.97 ±  2%     -18.7%     177.98 ±  3%  vm-scalability.time.elapsed_time
    218.97 ±  2%     -18.7%     177.98 ±  3%  vm-scalability.time.elapsed_time.max
    121357 ±  7%     -24.9%      91162 ± 10%  vm-scalability.time.involuntary_context_switches
     11226            -5.2%      10641        vm-scalability.time.percent_of_cpu_this_job_got
      2311 ±  3%     -35.2%       1496 ±  6%  vm-scalability.time.system_time
     22275 ±  2%     -21.7%      17443 ±  3%  vm-scalability.time.user_time
      9358 ±  3%     -13.1%       8130        vm-scalability.time.voluntary_context_switches
    255.23           -16.1%     214.10 ±  2%  uptime.boot
      2593            +6.8%       2771 ±  5%  vmstat.system.cs
     11.51 ±  7%      +4.5       16.05 ±  8%  mpstat.cpu.all.idle%
      8.48 ±  2%      -1.6        6.84 ±  3%  mpstat.cpu.all.sys%
    727581 ± 12%     -17.2%     602238 ±  6%  numa-numastat.node1.local_node
    798037 ±  8%     -13.3%     691955 ±  6%  numa-numastat.node1.numa_hit
   5806206 ± 17%     +26.7%    7356010 ± 10%  turbostat.C1E
      9.55 ± 26%      +5.9       15.48 ±  9%  turbostat.C1E%
  59854751 ±  2%     -17.8%   49202950 ±  3%  turbostat.IRQ
     42804 ±  6%     -54.9%      19301 ± 21%  meminfo.Active
     41832 ±  7%     -56.2%      18325 ± 23%  meminfo.Active(anon)
     63386 ±  6%     -26.6%      46542 ±  3%  meminfo.Mapped
    137758           -25.5%     102591 ±  3%  meminfo.Shmem
     36980 ±  5%     -62.6%      13823 ± 29%  numa-meminfo.node1.Active
     36495 ±  5%     -63.9%      13173 ± 30%  numa-meminfo.node1.Active(anon)
     19454 ± 26%     -57.7%       8233 ± 33%  numa-meminfo.node1.Mapped
     65896 ± 38%     -67.8%      21189 ± 13%  numa-meminfo.node1.Shmem
      9185 ±  6%     -64.7%       3246 ± 31%  numa-vmstat.node1.nr_active_anon
      4769 ± 26%     -54.5%       2171 ± 32%  numa-vmstat.node1.nr_mapped
     16462 ± 37%     -68.1%       5258 ± 14%  numa-vmstat.node1.nr_shmem
      9185 ±  6%     -64.7%       3246 ± 31%  numa-vmstat.node1.nr_zone_active_anon
     10436 ±  5%     -56.2%       4570 ± 23%  proc-vmstat.nr_active_anon
     69290            +1.3%      70203        proc-vmstat.nr_anon_pages
   1717695            +4.5%    1794462        proc-vmstat.nr_dirty_background_threshold
   3439592            +4.5%    3593312        proc-vmstat.nr_dirty_threshold
    640952            -1.4%     632171        proc-vmstat.nr_file_pages
  17356030            +4.4%   18125242        proc-vmstat.nr_free_pages
     93258            -2.4%      91059        proc-vmstat.nr_inactive_anon
     16187 ±  5%     -26.4%      11911 ±  2%  proc-vmstat.nr_mapped
     34477 ±  2%     -25.6%      25663 ±  4%  proc-vmstat.nr_shmem
     10436 ±  5%     -56.2%       4570 ± 23%  proc-vmstat.nr_zone_active_anon
     93258            -2.4%      91059        proc-vmstat.nr_zone_inactive_anon
     32151 ± 16%     -61.0%      12542 ± 13%  proc-vmstat.numa_hint_faults
     21214 ± 22%     -86.0%       2964 ± 45%  proc-vmstat.numa_hint_faults_local
   1598135           -10.9%    1423466        proc-vmstat.numa_hit
   1481881           -11.8%    1307551        proc-vmstat.numa_local
    117279            -1.2%     115916        proc-vmstat.numa_other
    555445 ± 16%     -53.2%     260178 ± 53%  proc-vmstat.numa_pte_updates
     93889 ±  4%     -74.3%      24113 ±  7%  proc-vmstat.pgactivate
   1599893           -11.0%    1424527        proc-vmstat.pgalloc_normal
   1594626           -14.2%    1368920        proc-vmstat.pgfault
   1609987           -20.8%    1275284        proc-vmstat.pgfree
     49893           -14.8%      42496 ±  5%  proc-vmstat.pgreuse
     15.23 ±  2%      -7.8%      14.04        perf-stat.i.MPKI
 1.348e+10           +22.0%  1.645e+10 ±  3%  perf-stat.i.branch-instructions
 6.957e+08 ±  2%     +22.4%  8.517e+08 ±  3%  perf-stat.i.cache-misses
 7.117e+08 ±  2%     +22.4%   8.71e+08 ±  3%  perf-stat.i.cache-references
      7.86 ±  2%     -29.0%       5.58 ±  6%  perf-stat.i.cpi
 3.739e+11            -5.1%  3.549e+11        perf-stat.i.cpu-cycles
    550.18 ±  3%     -22.2%     427.87 ±  5%  perf-stat.i.cycles-between-cache-misses
 1.605e+10           +22.1%  1.959e+10 ±  3%  perf-stat.i.dTLB-loads
      0.02 ±  3%      -0.0        0.01 ±  4%  perf-stat.i.dTLB-store-miss-rate%
    921125 ±  2%      -4.6%     878569        perf-stat.i.dTLB-store-misses
 5.803e+09           +22.0%  7.078e+09 ±  3%  perf-stat.i.dTLB-stores
 5.665e+10           +22.0%  6.911e+10 ±  3%  perf-stat.i.instructions
      0.16 ±  3%     +26.1%       0.20 ±  3%  perf-stat.i.ipc
      2.92            -5.1%       2.77        perf-stat.i.metric.GHz
    123.32 ± 16%    +158.4%     318.61 ± 22%  perf-stat.i.metric.K/sec
    286.92           +21.8%     349.59 ±  3%  perf-stat.i.metric.M/sec
      6641            +4.8%       6957 ±  2%  perf-stat.i.minor-faults
    586608 ± 12%     +36.4%     800024 ±  7%  perf-stat.i.node-loads
     26.79 ±  4%     -10.5       16.31 ± 12%  perf-stat.i.node-store-miss-rate%
 1.785e+08 ±  2%     -27.7%  1.291e+08 ±  7%  perf-stat.i.node-store-misses
 5.131e+08 ±  3%     +39.8%  7.172e+08 ±  5%  perf-stat.i.node-stores
      6643            +4.8%       6959 ±  2%  perf-stat.i.page-faults
      0.02 ± 18%      -0.0        0.01 ±  4%  perf-stat.overall.branch-miss-rate%
      6.66 ±  2%     -22.5%       5.16 ±  3%  perf-stat.overall.cpi
    539.35 ±  2%     -22.7%     416.69 ±  3%  perf-stat.overall.cycles-between-cache-misses
      0.02 ±  3%      -0.0        0.01 ±  3%  perf-stat.overall.dTLB-store-miss-rate%
      0.15 ±  2%     +29.1%       0.19 ±  3%  perf-stat.overall.ipc
     25.88 ±  4%     -10.6       15.28 ± 10%  perf-stat.overall.node-store-miss-rate%
 1.325e+10 ±  2%     +22.3%  1.622e+10 ±  3%  perf-stat.ps.branch-instructions
  6.88e+08 ±  2%     +22.7%  8.444e+08 ±  3%  perf-stat.ps.cache-misses
 7.043e+08 ±  2%     +22.7%  8.638e+08 ±  3%  perf-stat.ps.cache-references
 3.708e+11            -5.2%  3.515e+11        perf-stat.ps.cpu-cycles
 1.577e+10 ±  2%     +22.4%  1.931e+10 ±  3%  perf-stat.ps.dTLB-loads
    910623 ±  2%      -4.6%     868700        perf-stat.ps.dTLB-store-misses
 5.701e+09 ±  2%     +22.3%  6.975e+09 ±  3%  perf-stat.ps.dTLB-stores
 5.569e+10 ±  2%     +22.3%  6.813e+10 ±  3%  perf-stat.ps.instructions
      6716            +4.8%       7038        perf-stat.ps.minor-faults
    595302 ± 11%     +37.2%     816710 ±  8%  perf-stat.ps.node-loads
 1.769e+08 ±  2%     -27.8%  1.277e+08 ±  7%  perf-stat.ps.node-store-misses
 5.071e+08 ±  3%     +40.3%  7.113e+08 ±  5%  perf-stat.ps.node-stores
      6717            +4.8%       7039        perf-stat.ps.page-faults
      0.00            +0.8        0.80 ±  8%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.rmqueue_bulk.get_page_from_freelist.__alloc_pages
      0.00            +0.8        0.80 ±  8%  perf-profile.calltrace.cycles-pp._raw_spin_lock.rmqueue_bulk.get_page_from_freelist.__alloc_pages.alloc_buddy_huge_page
      0.00            +0.8        0.83 ±  8%  perf-profile.calltrace.cycles-pp.rmqueue_bulk.get_page_from_freelist.__alloc_pages.alloc_buddy_huge_page.alloc_fresh_huge_page
      0.00            +0.8        0.84 ±  8%  perf-profile.calltrace.cycles-pp.__alloc_pages.alloc_buddy_huge_page.alloc_fresh_huge_page.alloc_surplus_huge_page.hugetlb_acct_memory
      0.00            +0.8        0.84 ±  8%  perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages.alloc_buddy_huge_page.alloc_fresh_huge_page.alloc_surplus_huge_page
      0.00            +0.8        0.84 ±  8%  perf-profile.calltrace.cycles-pp.alloc_buddy_huge_page.alloc_fresh_huge_page.alloc_surplus_huge_page.hugetlb_acct_memory.hugetlb_reserve_pages
      0.00            +0.9        0.85 ±  8%  perf-profile.calltrace.cycles-pp.alloc_fresh_huge_page.alloc_surplus_huge_page.hugetlb_acct_memory.hugetlb_reserve_pages.hugetlbfs_file_mmap
      0.00            +0.9        0.88 ±  8%  perf-profile.calltrace.cycles-pp.alloc_surplus_huge_page.hugetlb_acct_memory.hugetlb_reserve_pages.hugetlbfs_file_mmap.mmap_region
      0.00            +0.9        0.88 ±  8%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mmap
      0.00            +0.9        0.88 ±  8%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
      0.00            +0.9        0.88 ±  8%  perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
      0.00            +0.9        0.88 ±  8%  perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.00            +0.9        0.88 ±  8%  perf-profile.calltrace.cycles-pp.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
      0.00            +0.9        0.88 ±  8%  perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
      0.00            +0.9        0.88 ±  8%  perf-profile.calltrace.cycles-pp.__mmap
      0.00            +0.9        0.88 ±  8%  perf-profile.calltrace.cycles-pp.hugetlbfs_file_mmap.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
      0.00            +0.9        0.88 ±  8%  perf-profile.calltrace.cycles-pp.hugetlb_reserve_pages.hugetlbfs_file_mmap.mmap_region.do_mmap.vm_mmap_pgoff
      0.00            +0.9        0.88 ±  8%  perf-profile.calltrace.cycles-pp.hugetlb_acct_memory.hugetlb_reserve_pages.hugetlbfs_file_mmap.mmap_region.do_mmap
     60.28 ±  5%      +4.7       64.98 ±  2%  perf-profile.calltrace.cycles-pp.do_rw_once
      0.09 ±  8%      +0.0        0.11 ±  9%  perf-profile.children.cycles-pp.task_tick_fair
      0.14 ±  7%      +0.0        0.17 ±  5%  perf-profile.children.cycles-pp.scheduler_tick
      0.20 ±  9%      +0.0        0.24 ±  3%  perf-profile.children.cycles-pp.tick_sched_timer
      0.19 ±  9%      +0.0        0.24 ±  4%  perf-profile.children.cycles-pp.tick_sched_handle
      0.19 ±  9%      +0.0        0.23 ±  4%  perf-profile.children.cycles-pp.update_process_times
      0.24 ±  8%      +0.0        0.29 ±  3%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.40 ±  8%      +0.1        0.45 ±  3%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.39 ±  7%      +0.1        0.45 ±  3%  perf-profile.children.cycles-pp.hrtimer_interrupt
      0.26 ± 71%      +0.6        0.86 ±  8%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
      0.28 ± 71%      +0.6        0.88 ±  8%  perf-profile.children.cycles-pp.__mmap
      0.28 ± 71%      +0.6        0.88 ±  8%  perf-profile.children.cycles-pp.ksys_mmap_pgoff
      0.27 ± 71%      +0.6        0.88 ±  8%  perf-profile.children.cycles-pp.hugetlbfs_file_mmap
      0.27 ± 71%      +0.6        0.88 ±  8%  perf-profile.children.cycles-pp.hugetlb_reserve_pages
      0.27 ± 71%      +0.6        0.88 ±  8%  perf-profile.children.cycles-pp.hugetlb_acct_memory
      0.27 ± 71%      +0.6        0.88 ±  8%  perf-profile.children.cycles-pp.alloc_surplus_huge_page
      0.28 ± 71%      +0.6        0.88 ±  8%  perf-profile.children.cycles-pp.vm_mmap_pgoff
      0.28 ± 71%      +0.6        0.88 ±  8%  perf-profile.children.cycles-pp.do_mmap
      0.28 ± 71%      +0.6        0.88 ±  8%  perf-profile.children.cycles-pp.mmap_region
      0.55 ± 44%      +0.6        1.16 ±  9%  perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.55 ± 44%      +0.6        1.16 ±  9%  perf-profile.children.cycles-pp.do_syscall_64
      0.12 ± 71%      +0.7        0.85 ±  8%  perf-profile.children.cycles-pp.alloc_fresh_huge_page
      0.03 ± 70%      +0.8        0.84 ±  8%  perf-profile.children.cycles-pp.alloc_buddy_huge_page
      0.04 ± 71%      +0.8        0.84 ±  8%  perf-profile.children.cycles-pp.get_page_from_freelist
      0.04 ± 71%      +0.8        0.84 ±  8%  perf-profile.children.cycles-pp.__alloc_pages
      0.00            +0.8        0.82 ±  8%  perf-profile.children.cycles-pp._raw_spin_lock
      0.00            +0.8        0.83 ±  8%  perf-profile.children.cycles-pp.rmqueue_bulk
      0.26 ± 71%      +0.6        0.86 ±  8%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0-DAY CI Kernel Test Service
https://lists.01.org/hyperkitty/list/lkp@lists.01.org

Thanks,
Oliver Sang


View attachment "config-5.17.0-rc7-00001-g8212a964ee02" of type "text/plain" (162152 bytes)

View attachment "job-script" of type "text/plain" (8251 bytes)

View attachment "job.yaml" of type "text/plain" (5542 bytes)

View attachment "reproduce" of type "text/plain" (2052 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ