lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <202211211037.2b2e5e1f-yujie.liu@intel.com>
Date:   Mon, 21 Nov 2022 11:03:08 +0800
From:   kernel test robot <yujie.liu@...el.com>
To:     David Hildenbrand <david@...hat.com>
CC:     <oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Jason Gunthorpe <jgg@...dia.com>,
        John Hubbard <jhubbard@...dia.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Hugh Dickins <hughd@...gle.com>, Peter Xu <peterx@...hat.com>,
        Alistair Popple <apopple@...dia.com>,
        Nadav Amit <namit@...are.com>, Yang Shi <shy828301@...il.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Michal Hocko <mhocko@...nel.org>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Andrea Parri <parri.andrea@...il.com>,
        Will Deacon <will@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        "Christoph von Recklinghausen" <crecklin@...hat.com>,
        Don Dutile <ddutile@...hat.com>,
        <linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
        <ying.huang@...el.com>, <feng.tang@...el.com>,
        <zhengjun.xing@...ux.intel.com>, <fengwei.yin@...el.com>
Subject: [linus:master] [mm] 088b8aa537: vm-scalability.throughput -6.5%
 regression

Greeting,

FYI, we noticed a -6.5% regression of vm-scalability.throughput due to commit:

commit: 088b8aa537c2c767765f1c19b555f21ffe555786 ("mm: fix PageAnonExclusive clearing racing with concurrent RCU GUP-fast")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: vm-scalability
on test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz (Cascade Lake) with 128G memory
with following parameters:

	thp_enabled: never
	thp_defrag: never
	nr_task: 1
	nr_pmem: 2
	priority: 1
	test: swap-w-seq
	cpufreq_governor: performance

test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/


Details are as below:

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_pmem/nr_task/priority/rootfs/tbox_group/test/testcase/thp_defrag/thp_enabled:
  gcc-11/performance/x86_64-rhel-8.3/2/1/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp9/swap-w-seq/vm-scalability/never/never

commit: 
  e7b72c48d6 ("mm/mremap_pages: save a few cycles in get_dev_pagemap()")
  088b8aa537 ("mm: fix PageAnonExclusive clearing racing with concurrent RCU GUP-fast")

e7b72c48d677c244 088b8aa537c2c767765f1c19b55 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
   1061123            -6.5%     991719 ±  3%  vm-scalability.median
   1061123            -6.5%     991719 ±  3%  vm-scalability.throughput
    106.50            +6.7%     113.64 ±  3%  vm-scalability.time.elapsed_time
    106.50            +6.7%     113.64 ±  3%  vm-scalability.time.elapsed_time.max
     95.00            -3.9%      91.33        vm-scalability.time.percent_of_cpu_this_job_got
    464.83 ± 45%     +77.2%     823.50 ± 32%  numa-vmstat.node1.workingset_refault_anon
  41828947 ±  3%     +29.1%   54019629 ±  2%  turbostat.IRQ
     46348 ±  3%     +36.6%      63322 ± 41%  turbostat.POLL
    593.33            -3.5%     572.83 ±  2%  vmstat.swap.si
    626370            -5.7%     590394 ±  3%  vmstat.swap.so
    272382 ±  4%     +14.9%     313058 ±  2%  vmstat.system.in
   6487093 ± 55%     -69.9%    1954992 ± 36%  proc-vmstat.compact_migrate_scanned
      1184 ±  5%     -24.7%     892.17 ± 20%  proc-vmstat.kswapd_low_wmark_hit_quickly
     96594            -5.3%      91472 ±  2%  proc-vmstat.nr_dirty_background_threshold
    193425            -5.3%     183168 ±  2%  proc-vmstat.nr_dirty_threshold
   1004718            -5.1%     953385        proc-vmstat.nr_free_pages
    283.33 ± 19%     -28.5%     202.67 ± 24%  proc-vmstat.nr_inactive_file
    282.17 ± 19%     -28.4%     202.17 ± 24%  proc-vmstat.nr_zone_inactive_file
      1504 ±  2%     -16.9%       1250 ± 10%  proc-vmstat.pageoutrun
      3589 ± 13%    +115.7%       7743 ± 12%  proc-vmstat.pgactivate
  22448232            +3.3%   23184909        proc-vmstat.pgalloc_normal
      9440 ± 22%     +69.2%      15974        proc-vmstat.pgmajfault
     19906 ±  3%      +4.9%      20882 ±  2%  proc-vmstat.pgreuse
      9569 ± 22%     +69.7%      16238        proc-vmstat.pswpin
      1089 ±  5%     +38.4%       1507 ± 23%  proc-vmstat.workingset_refault_anon
  1.77e+09            -4.6%  1.689e+09 ±  2%  perf-stat.i.branch-instructions
    190.04            +1.4%     192.69        perf-stat.i.cpu-migrations
     25.17 ± 10%     +14.9       40.02 ±  9%  perf-stat.i.iTLB-load-miss-rate%
    690930 ±  4%     +57.4%    1087333 ±  4%  perf-stat.i.iTLB-load-misses
   2149984 ±  8%     -24.2%    1630299 ± 13%  perf-stat.i.iTLB-loads
 6.979e+09            -4.4%  6.675e+09 ±  2%  perf-stat.i.instructions
     10160 ±  4%     -36.6%       6443 ±  5%  perf-stat.i.instructions-per-iTLB-miss
     69.47 ± 31%     +78.3%     123.87 ± 16%  perf-stat.i.major-faults
    223174            -6.0%     209787 ±  3%  perf-stat.i.minor-faults
    223243            -6.0%     209911 ±  3%  perf-stat.i.page-faults
     24.44 ±  8%     +15.8       40.28 ±  9%  perf-stat.overall.iTLB-load-miss-rate%
     10119 ±  4%     -39.2%       6151 ±  4%  perf-stat.overall.instructions-per-iTLB-miss
      0.91 ±  5%      -9.1%       0.82 ±  4%  perf-stat.overall.ipc
     57.75 ±  6%      +8.2       65.97 ±  5%  perf-stat.overall.node-load-miss-rate%
 1.753e+09            -4.5%  1.674e+09 ±  2%  perf-stat.ps.branch-instructions
    188.25            +1.5%     191.00        perf-stat.ps.cpu-migrations
    684471 ±  4%     +57.5%    1077795 ±  4%  perf-stat.ps.iTLB-load-misses
   2129593 ±  8%     -24.1%    1616077 ± 13%  perf-stat.ps.iTLB-loads
 6.914e+09            -4.3%  6.617e+09 ±  2%  perf-stat.ps.instructions
     68.83 ± 31%     +78.5%     122.87 ± 16%  perf-stat.ps.major-faults
    221092            -5.9%     207957 ±  3%  perf-stat.ps.minor-faults
    221161            -5.9%     208080 ±  3%  perf-stat.ps.page-faults
      2.62 ± 37%      -2.0        0.62 ± 79%  perf-profile.calltrace.cycles-pp.try_to_unmap_one.rmap_walk_anon.try_to_unmap.shrink_page_list.shrink_inactive_list
      2.82 ± 36%      -1.9        0.88 ± 60%  perf-profile.calltrace.cycles-pp.rmap_walk_anon.try_to_unmap.shrink_page_list.shrink_inactive_list.shrink_lruvec
      2.86 ± 35%      -1.9        0.92 ± 60%  perf-profile.calltrace.cycles-pp.try_to_unmap.shrink_page_list.shrink_inactive_list.shrink_lruvec.shrink_node_memcgs
      0.00            +2.4        2.36 ± 44%  perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush_dirty.shrink_page_list
      0.00            +2.4        2.37 ± 44%  perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush_dirty.shrink_page_list.shrink_inactive_list
      0.00            +2.4        2.41 ± 44%  perf-profile.calltrace.cycles-pp.arch_tlbbatch_flush.try_to_unmap_flush_dirty.shrink_page_list.shrink_inactive_list.shrink_lruvec
      0.00            +2.4        2.41 ± 44%  perf-profile.calltrace.cycles-pp.try_to_unmap_flush_dirty.shrink_page_list.shrink_inactive_list.shrink_lruvec.shrink_node_memcgs
      2.87 ± 35%      -1.9        0.97 ± 49%  perf-profile.children.cycles-pp.try_to_unmap
      2.63 ± 37%      -1.9        0.75 ± 47%  perf-profile.children.cycles-pp.try_to_unmap_one
      0.23 ± 29%      +0.1        0.35 ± 29%  perf-profile.children.cycles-pp.get_mem_cgroup_from_mm
      0.39 ± 18%      +0.1        0.52 ± 10%  perf-profile.children.cycles-pp.sync_regs
      0.07 ± 20%      +0.1        0.20 ± 23%  perf-profile.children.cycles-pp.llist_reverse_order
      0.53 ± 17%      +0.1        0.67 ± 12%  perf-profile.children.cycles-pp.error_entry
      0.20 ± 31%      +0.2        0.35 ± 13%  perf-profile.children.cycles-pp.flush_tlb_func
      0.00            +0.2        0.19 ± 17%  perf-profile.children.cycles-pp.native_flush_tlb_local
      0.35 ± 25%      +0.5        0.88 ± 19%  perf-profile.children.cycles-pp.__flush_smp_call_function_queue
      0.35 ± 26%      +0.5        0.88 ± 19%  perf-profile.children.cycles-pp.__sysvec_call_function_single
      0.43 ± 25%      +0.6        0.99 ± 17%  perf-profile.children.cycles-pp.sysvec_call_function_single
      0.73 ± 23%      +0.8        1.54 ± 14%  perf-profile.children.cycles-pp.asm_sysvec_call_function_single
      0.00            +2.4        2.41 ± 44%  perf-profile.children.cycles-pp.arch_tlbbatch_flush
      0.00            +2.4        2.41 ± 44%  perf-profile.children.cycles-pp.try_to_unmap_flush_dirty
      0.95 ± 11%      -0.2        0.75 ± 10%  perf-profile.self.cycles-pp.shrink_page_list
      0.23 ± 13%      -0.0        0.18 ± 17%  perf-profile.self.cycles-pp.cpuidle_idle_call
      0.10 ± 29%      +0.1        0.16 ± 15%  perf-profile.self.cycles-pp.__handle_mm_fault
      0.38 ± 18%      +0.1        0.51 ± 10%  perf-profile.self.cycles-pp.sync_regs
      0.07 ± 20%      +0.1        0.20 ± 23%  perf-profile.self.cycles-pp.llist_reverse_order
      0.00            +0.2        0.19 ± 18%  perf-profile.self.cycles-pp.native_flush_tlb_local
      0.09 ± 31%      +0.2        0.33 ± 36%  perf-profile.self.cycles-pp.__flush_smp_call_function_queue



If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <yujie.liu@...el.com>
| Link: https://lore.kernel.org/oe-lkp/202211211037.2b2e5e1f-yujie.liu@intel.com


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        sudo bin/lkp install job.yaml           # job file is attached in this email
        bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
        sudo bin/lkp run generated-yaml-file

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

View attachment "config-6.0.0-rc3-00139-g088b8aa537c2" of type "text/plain" (164374 bytes)

View attachment "job-script" of type "text/plain" (8453 bytes)

View attachment "job.yaml" of type "text/plain" (5883 bytes)

View attachment "reproduce" of type "text/plain" (999 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ