lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:   Tue, 5 Jan 2021 11:23:13 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     John Hubbard <jhubbard@...dia.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Chris Wilson <chris@...is-wilson.co.uk>,
        Daniel Vetter <daniel@...ll.ch>,
        David Airlie <airlied@...ux.ie>,
        Jani Nikula <jani.nikula@...ux.intel.com>,
        Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>,
        Matthew Auld <matthew.auld@...el.com>,
        Matthew Wilcox <willy@...radead.org>,
        Rodrigo Vivi <rodrigo.vivi@...el.com>,
        Souptick Joarder <jrdr.linux@...il.com>,
        Tvrtko Ursulin <tvrtko.ursulin@...el.com>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
        lkp@...el.com, ying.huang@...el.com, feng.tang@...el.com,
        zhengjun.xing@...el.com
Subject: [mm/gup]  376a34efa4:  will-it-scale.per_process_ops -5.5% regression


Greeting,

FYI, we noticed a -5.5% regression of will-it-scale.per_process_ops due to commit:


commit: 376a34efa4eeb699d285c1a741b186d44b44c429 ("mm/gup: refactor and de-duplicate gup_fast() code")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


in testcase: will-it-scale
on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
with following parameters:

	nr_task: 50%
	mode: process
	test: futex1
	cpufreq_governor: performance
	ucode: 0x5003003

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <oliver.sang@...el.com>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp install job.yaml  # job file is attached in this email
        bin/lkp run     job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
  gcc-9/performance/x86_64-rhel-8.3/process/50%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2ap2/futex1/will-it-scale/0x5003003

commit: 
  9e1f0580d3 ("mm/gup: move __get_user_pages_fast() down a few lines in gup.c")
  376a34efa4 ("mm/gup: refactor and de-duplicate gup_fast() code")

9e1f0580d37e0d3f 376a34efa4eeb699d285c1a741b 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
 6.067e+08            -5.5%  5.733e+08        will-it-scale.96.processes
   6319316            -5.5%    5972248        will-it-scale.per_process_ops
 6.067e+08            -5.5%  5.733e+08        will-it-scale.workload
     32.90 ±  4%      -4.6%      31.38        boot-time.boot
      5329 ±  4%      -8.1%       4897 ±  5%  boot-time.idle
   8256765 ± 70%     -68.3%    2614568 ± 43%  cpuidle.C1.time
    145939 ± 87%     -64.3%      52167 ± 20%  cpuidle.C1.usage
   7722907 ± 20%     -57.4%    3290523 ±103%  sched_debug.cfs_rq:/.spread0.avg
  12708331 ±  7%     -39.1%    7733820 ± 46%  sched_debug.cfs_rq:/.spread0.max
  -1658498          +253.9%   -5869460        sched_debug.cfs_rq:/.spread0.min
      5890 ± 29%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.preempt_schedule_common._cond_resched.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
      1.50 ± 33%    -100.0%       0.00        perf-sched.wait_and_delay.count.preempt_schedule_common._cond_resched.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
      7559          -100.0%       0.00        perf-sched.wait_and_delay.max.ms.preempt_schedule_common._cond_resched.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
      5890 ± 29%    -100.0%       0.00        perf-sched.wait_time.avg.ms.preempt_schedule_common._cond_resched.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
      7559          -100.0%       0.00        perf-sched.wait_time.max.ms.preempt_schedule_common._cond_resched.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
    150049            +2.8%     154285        proc-vmstat.nr_active_anon
     89133            +2.0%      90906        proc-vmstat.nr_anon_pages
      2959            +3.0%       3048        proc-vmstat.nr_page_table_pages
    150049            +2.8%     154285        proc-vmstat.nr_zone_active_anon
   1412412            +1.5%    1432905        proc-vmstat.pgalloc_normal
   1300962            +1.1%    1314985        proc-vmstat.pgfault
    693.25 ± 23%     -78.9%     146.25 ± 43%  interrupts.CPU1.RES:Rescheduling_interrupts
      1.75 ± 24%  +10957.1%     193.50 ±159%  interrupts.CPU1.TLB:TLB_shootdowns
     17.75 ± 54%    +181.7%      50.00 ± 84%  interrupts.CPU104.RES:Rescheduling_interrupts
     76.00 ± 46%     -73.0%      20.50 ± 32%  interrupts.CPU105.RES:Rescheduling_interrupts
    121.00 ±115%     -84.3%      19.00 ± 24%  interrupts.CPU113.RES:Rescheduling_interrupts
      2.00 ± 61%   +4262.5%      87.25 ±156%  interrupts.CPU118.TLB:TLB_shootdowns
      3238 ± 20%     +56.0%       5051 ± 48%  interrupts.CPU162.NMI:Non-maskable_interrupts
      3238 ± 20%     +56.0%       5051 ± 48%  interrupts.CPU162.PMI:Performance_monitoring_interrupts
      4269 ± 62%     +75.4%       7490 ± 24%  interrupts.CPU171.NMI:Non-maskable_interrupts
      4269 ± 62%     +75.4%       7490 ± 24%  interrupts.CPU171.PMI:Performance_monitoring_interrupts
      2851 ± 35%    +100.3%       5711 ± 35%  interrupts.CPU174.NMI:Non-maskable_interrupts
      2851 ± 35%    +100.3%       5711 ± 35%  interrupts.CPU174.PMI:Performance_monitoring_interrupts
      5020 ± 42%     +51.1%       7587 ± 24%  interrupts.CPU26.NMI:Non-maskable_interrupts
      5020 ± 42%     +51.1%       7587 ± 24%  interrupts.CPU26.PMI:Performance_monitoring_interrupts
     75.75 ± 49%     -76.2%      18.00 ± 57%  interrupts.CPU28.RES:Rescheduling_interrupts
      1.50 ± 74%  +11966.7%     181.00 ±166%  interrupts.CPU3.TLB:TLB_shootdowns
      2883 ± 36%     +75.2%       5051 ± 48%  interrupts.CPU51.NMI:Non-maskable_interrupts
      2883 ± 36%     +75.2%       5051 ± 48%  interrupts.CPU51.PMI:Performance_monitoring_interrupts
     16.25 ± 44%    +192.3%      47.50 ± 87%  interrupts.CPU65.RES:Rescheduling_interrupts
      1.25 ± 34%  +12700.0%     160.00 ±163%  interrupts.CPU7.TLB:TLB_shootdowns
      2856           +50.5%       4298 ± 32%  interrupts.CPU71.NMI:Non-maskable_interrupts
      2856           +50.5%       4298 ± 32%  interrupts.CPU71.PMI:Performance_monitoring_interrupts
    110.75 ± 39%     -58.2%      46.25 ±100%  interrupts.CPU86.RES:Rescheduling_interrupts
    127.75 ± 84%     -60.5%      50.50 ± 83%  interrupts.CPU90.RES:Rescheduling_interrupts
      2.25 ± 57%   +1388.9%      33.50 ±137%  interrupts.CPU98.TLB:TLB_shootdowns
    230.25 ±  3%     +35.9%     313.00 ± 14%  interrupts.IWI:IRQ_work_interrupts
      1266 ± 55%    +494.7%       7533 ± 68%  interrupts.TLB:TLB_shootdowns
     20.74            -3.2       17.57 ± 11%  perf-profile.calltrace.cycles-pp.gup_pgd_range.internal_get_user_pages_fast.get_futex_key.futex_wake.do_futex
     18.46            -2.7       15.78 ± 10%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.syscall
      4.72            -0.7        4.06 ± 11%  perf-profile.calltrace.cycles-pp.try_grab_compound_head.gup_pgd_range.internal_get_user_pages_fast.get_futex_key.futex_wake
      1.89            -0.6        1.33 ± 11%  perf-profile.calltrace.cycles-pp.testcase
      2.99            -0.5        2.53 ± 11%  perf-profile.calltrace.cycles-pp.hash_futex.futex_wake.do_futex.__x64_sys_futex.do_syscall_64
      1.65            -0.2        1.42 ± 11%  perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.syscall
     21.01            -3.2       17.83 ± 11%  perf-profile.children.cycles-pp.gup_pgd_range
     12.26            -1.7       10.54 ± 10%  perf-profile.children.cycles-pp.entry_SYSCALL_64
      7.83            -1.0        6.81 ± 11%  perf-profile.children.cycles-pp.syscall_return_via_sysret
      4.72            -0.7        4.06 ± 11%  perf-profile.children.cycles-pp.try_grab_compound_head
      2.00            -0.6        1.37 ± 11%  perf-profile.children.cycles-pp.testcase
      2.99            -0.5        2.53 ± 11%  perf-profile.children.cycles-pp.hash_futex
      0.28            -0.1        0.17 ± 10%  perf-profile.children.cycles-pp.syscall@plt
      0.30            -0.1        0.24 ± 11%  perf-profile.children.cycles-pp.__x86_indirect_thunk_rax
      0.22            -0.1        0.16 ± 14%  perf-profile.children.cycles-pp.get_user_pages_fast
      0.35            -0.1        0.30 ±  9%  perf-profile.children.cycles-pp.pmd_huge
     15.88            -2.5       13.41 ± 11%  perf-profile.self.cycles-pp.gup_pgd_range
      9.06            -1.5        7.57 ± 11%  perf-profile.self.cycles-pp.syscall
      4.91            -1.2        3.73 ± 11%  perf-profile.self.cycles-pp.do_syscall_64
      7.83            -1.0        6.81 ± 11%  perf-profile.self.cycles-pp.syscall_return_via_sysret
      4.30            -0.9        3.40 ± 11%  perf-profile.self.cycles-pp.futex_wake
      6.04            -0.8        5.28 ± 10%  perf-profile.self.cycles-pp.entry_SYSCALL_64
      4.57            -0.6        3.93 ± 11%  perf-profile.self.cycles-pp.try_grab_compound_head
      1.66            -0.6        1.08 ± 11%  perf-profile.self.cycles-pp.testcase
      2.96            -0.4        2.51 ± 11%  perf-profile.self.cycles-pp.hash_futex
      3.02            -0.4        2.58 ± 11%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      1.73            -0.2        1.49 ± 11%  perf-profile.self.cycles-pp.__x64_sys_futex
      1.29            -0.2        1.12 ± 11%  perf-profile.self.cycles-pp.do_futex
      0.31            -0.0        0.26 ±  8%  perf-profile.self.cycles-pp.pmd_huge
      0.16            -0.0        0.13 ± 10%  perf-profile.self.cycles-pp.syscall@plt
      3.13            +2.5        5.59 ± 11%  perf-profile.self.cycles-pp.internal_get_user_pages_fast
 7.362e+10            -5.5%  6.959e+10        perf-stat.i.branch-instructions
      0.41            -0.2        0.19        perf-stat.i.branch-miss-rate%
 2.972e+08           -58.1%  1.245e+08        perf-stat.i.branch-misses
     13.95 ±  2%      +0.6       14.53        perf-stat.i.cache-miss-rate%
   1608471 ±  3%      +7.5%    1728582 ±  2%  perf-stat.i.cache-misses
  11874653            +2.1%   12120398        perf-stat.i.cache-references
      0.59            +5.9%       0.63        perf-stat.i.cpi
    373293 ±  8%     +28.3%     478954 ± 10%  perf-stat.i.dTLB-load-misses
 1.202e+11            -4.5%  1.148e+11        perf-stat.i.dTLB-loads
 7.829e+10            -4.0%  7.515e+10        perf-stat.i.dTLB-stores
     96.94            -5.4       91.52        perf-stat.i.iTLB-load-miss-rate%
 2.909e+08           -58.6%  1.204e+08        perf-stat.i.iTLB-load-misses
 5.025e+11            -4.9%  4.778e+11        perf-stat.i.instructions
      1730          +129.3%       3968        perf-stat.i.instructions-per-iTLB-miss
      1.70            -5.4%       1.61        perf-stat.i.ipc
      0.86 ±  6%      -9.6%       0.78        perf-stat.i.metric.K/sec
      1415            -4.7%       1348        perf-stat.i.metric.M/sec
     93.21            +1.4       94.63        perf-stat.i.node-load-miss-rate%
     36909 ±  4%     -12.3%      32363 ± 10%  perf-stat.i.node-loads
     92.16 ±  6%      +5.3       97.47        perf-stat.i.node-store-miss-rate%
      0.02            +7.3%       0.03        perf-stat.overall.MPKI
      0.40            -0.2        0.18        perf-stat.overall.branch-miss-rate%
     13.61 ±  3%      +0.7       14.30 ±  2%  perf-stat.overall.cache-miss-rate%
      0.59            +5.6%       0.62        perf-stat.overall.cpi
      0.00 ±  5%      +0.0        0.00 ±  5%  perf-stat.overall.dTLB-load-miss-rate%
     97.02            -5.4       91.63        perf-stat.overall.iTLB-load-miss-rate%
      1727          +129.8%       3970        perf-stat.overall.instructions-per-iTLB-miss
      1.70            -5.3%       1.61        perf-stat.overall.ipc
     89.86            +1.6       91.50        perf-stat.overall.node-load-miss-rate%
 7.326e+10            -5.5%   6.92e+10        perf-stat.ps.branch-instructions
 2.958e+08           -58.1%  1.239e+08        perf-stat.ps.branch-misses
   1612946 ±  4%      +7.1%    1727850 ±  2%  perf-stat.ps.cache-misses
    386183 ±  5%     +25.5%     484727 ±  5%  perf-stat.ps.dTLB-load-misses
 1.196e+11            -4.6%  1.142e+11        perf-stat.ps.dTLB-loads
     15923 ±  4%      -6.7%      14859        perf-stat.ps.dTLB-store-misses
 7.791e+10            -4.1%  7.473e+10        perf-stat.ps.dTLB-stores
 2.894e+08           -58.6%  1.197e+08        perf-stat.ps.iTLB-load-misses
     5e+11            -5.0%  4.751e+11        perf-stat.ps.instructions
     36225 ±  4%     -12.8%      31590 ± 10%  perf-stat.ps.node-loads
 1.512e+14            -4.8%  1.439e+14        perf-stat.total.instructions
     13958 ±  6%     -10.1%      12554 ±  4%  softirqs.CPU104.RCU
     13815 ±  5%     -10.9%      12315 ±  5%  softirqs.CPU106.RCU
     95620 ±  3%     +31.2%     125489 ± 21%  softirqs.CPU106.TIMER
     98504 ±  4%     +19.6%     117772 ± 18%  softirqs.CPU11.TIMER
     13971 ±  5%      -8.3%      12815 ±  2%  softirqs.CPU112.RCU
     96603 ±  4%     +19.4%     115377 ± 19%  softirqs.CPU112.TIMER
     14000 ±  5%      -7.7%      12917 ±  5%  softirqs.CPU120.RCU
     95224 ±  3%     +25.0%     119064 ± 17%  softirqs.CPU122.TIMER
     14002 ±  6%     -12.2%      12290 ±  3%  softirqs.CPU126.RCU
     14028 ±  6%      -9.6%      12674 ±  4%  softirqs.CPU128.RCU
     28439 ± 50%     +37.5%      39098 ±  5%  softirqs.CPU128.SCHED
     94723 ±  3%     +30.6%     123672 ± 20%  softirqs.CPU128.TIMER
     14626 ±  4%     -13.0%      12732        softirqs.CPU13.RCU
     19080 ± 61%    +105.1%      39142 ±  4%  softirqs.CPU13.SCHED
     93419 ±  2%     +32.9%     124191 ± 20%  softirqs.CPU13.TIMER
     14169 ±  5%      -8.8%      12926 ±  3%  softirqs.CPU140.RCU
     14593 ±  6%     -11.7%      12887 ±  6%  softirqs.CPU158.RCU
     20004 ± 81%     +95.9%      39183 ±  4%  softirqs.CPU158.SCHED
     20928 ± 75%     +86.5%      39023 ±  4%  softirqs.CPU166.SCHED
     92411 ±  2%     +32.4%     122325 ± 20%  softirqs.CPU166.TIMER
     13632 ±  8%     -12.7%      11905 ±  4%  softirqs.CPU170.RCU
     94847 ±  3%     +31.3%     124502 ± 20%  softirqs.CPU170.TIMER
     13449 ±  6%     -12.0%      11836 ±  5%  softirqs.CPU172.RCU
     95541 ±  3%     +30.8%     124994 ± 19%  softirqs.CPU172.TIMER
     13559 ±  4%     -10.9%      12079 ± 10%  softirqs.CPU176.RCU
     17177 ± 11%     -14.8%      14642 ±  4%  softirqs.CPU26.RCU
     14994 ±  7%      -7.3%      13895 ±  5%  softirqs.CPU27.RCU
     28432 ± 50%     +38.7%      39440 ±  4%  softirqs.CPU27.SCHED
     94128 ±  3%     +32.7%     124889 ± 20%  softirqs.CPU27.TIMER
     15261 ±  4%     -10.7%      13632 ±  4%  softirqs.CPU29.RCU
     20787 ± 82%     +89.0%      39293 ±  4%  softirqs.CPU29.SCHED
     93893 ±  3%     +32.4%     124332 ± 20%  softirqs.CPU29.TIMER
     28710 ± 50%     +36.7%      39256 ±  4%  softirqs.CPU31.SCHED
     20986 ± 75%     +86.9%      39216 ±  4%  softirqs.CPU63.SCHED
     93462 ±  3%     +31.1%     122523 ± 20%  softirqs.CPU63.TIMER
     14745 ±  6%      -9.2%      13385 ±  5%  softirqs.CPU69.RCU
     28484 ± 50%     +37.5%      39172 ±  4%  softirqs.CPU69.SCHED
     95765 ±  4%     +29.0%     123517 ± 20%  softirqs.CPU69.TIMER
     13848 ±  6%     -10.7%      12364 ±  4%  softirqs.CPU73.RCU
     96130           +28.6%     123605 ± 20%  softirqs.CPU73.TIMER
     17277 ± 89%     -79.6%       3520 ±  9%  softirqs.CPU74.SCHED
     94638 ±  4%     +27.8%     120924 ± 17%  softirqs.CPU75.TIMER
     14750 ±  6%      -9.5%      13345 ±  5%  softirqs.CPU96.RCU
     16482 ± 12%     -16.7%      13728 ±  3%  softirqs.CPU97.RCU


                                                                                
                              will-it-scale.96.processes                        
                                                                                
  6.1e+08 +-----------------------------------------------------------------+   
          |        .+.+.+.. .+.+.                                 +.+.+.+.+.|   
    6e+08 |.+.+.+.+        +     +.+.+.+.+.+.+.+.+.+.+.+.+..+.   +          |   
          |                                                   +.+           |   
  5.9e+08 |-+                                                               |   
          |                                                                 |   
  5.8e+08 |-+                                                               |   
          |                                          O O O  O O O           |   
  5.7e+08 |-+                O O     O O O O O O O O                        |   
          |             O  O     O O                                        |   
  5.6e+08 |-+                                                               |   
          |         O O                                                     |   
  5.5e+08 |-O O O O                                                         |   
          |                                                                 |   
  5.4e+08 +-----------------------------------------------------------------+   
                                                                                
                                                                                                                                                                
                             will-it-scale.per_process_ops                      
                                                                                
  6.4e+06 +-----------------------------------------------------------------+   
          |                                                                 |   
  6.3e+06 |-+      .+.+.+.. .+.+.                                 +.+.+.+.+.|   
          |.+.+.+.+        +     +.+.+.+.+.+.+.+.+.+.+.+.+..+.   +          |   
  6.2e+06 |-+                                                 +.+           |   
          |                                                                 |   
  6.1e+06 |-+                                                               |   
          |                                                                 |   
    6e+06 |-+                                        O O      O             |   
          |                  O O     O O O O O O O O     O  O   O           |   
  5.9e+06 |-+              O     O O                                        |   
          |             O                                                   |   
  5.8e+06 |-+                                                               |   
          |         O O                                                     |   
  5.7e+06 +-----------------------------------------------------------------+   
                                                                                
                                                                                                                                                                
                                will-it-scale.workload                          
                                                                                
  6.1e+08 +-----------------------------------------------------------------+   
          |        .+.+.+.. .+.+.                                 +.+.+.+.+.|   
    6e+08 |.+.+.+.+        +     +.+.+.+.+.+.+.+.+.+.+.+.+..+.   +          |   
          |                                                   +.+           |   
  5.9e+08 |-+                                                               |   
          |                                                                 |   
  5.8e+08 |-+                                                               |   
          |                                          O O O  O O O           |   
  5.7e+08 |-+                O O     O O O O O O O O                        |   
          |             O  O     O O                                        |   
  5.6e+08 |-+                                                               |   
          |         O O                                                     |   
  5.5e+08 |-O O O O                                                         |   
          |                                                                 |   
  5.4e+08 +-----------------------------------------------------------------+   
                                                                                
                                                                                
[*] bisect-good sample
[O] bisect-bad  sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Oliver Sang


View attachment "config-5.7.0-03831-g376a34efa4eeb" of type "text/plain" (157857 bytes)

View attachment "job-script" of type "text/plain" (7752 bytes)

View attachment "job.yaml" of type "text/plain" (5278 bytes)

View attachment "reproduce" of type "text/plain" (338 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ