lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZrGSGIxpsEoBG2iu@xsang-OptiPlex-9020>
Date: Tue, 6 Aug 2024 11:01:44 +0800
From: Oliver Sang <oliver.sang@...el.com>
To: Yu Zhao <yuzhao@...gle.com>
CC: Muchun Song <muchun.song@...ux.dev>, Janosch Frank
	<frankja@...ux.ibm.com>, <oe-lkp@...ts.linux.dev>, <lkp@...el.com>, "Linux
 Memory Management List" <linux-mm@...ck.org>, Andrew Morton
	<akpm@...ux-foundation.org>, David Hildenbrand <david@...hat.com>, "Frank van
 der Linden" <fvdl@...gle.com>, Matthew Wilcox <willy@...radead.org>, Peter Xu
	<peterx@...hat.com>, Yang Shi <yang@...amperecomputing.com>,
	<linux-kernel@...r.kernel.org>, <ying.huang@...el.com>,
	<feng.tang@...el.com>, <fengwei.yin@...el.com>, Christian Borntraeger
	<borntraeger@...ux.ibm.com>, Claudio Imbrenda <imbrenda@...ux.ibm.com>, "Marc
 Hartmayer" <mhartmay@...ux.ibm.com>, Heiko Carstens <hca@...ux.ibm.com>,
	Yosry Ahmed <yosryahmed@...gle.com>, <oliver.sang@...el.com>
Subject: Re: [linux-next:master] [mm/hugetlb_vmemmap] 875fa64577:
 vm-scalability.throughput -34.3% regression

hi, Yu Zhao,

On Sat, Aug 03, 2024 at 04:07:55PM -0600, Yu Zhao wrote:
> Hi Oliver,
> 
> On Fri, Jul 19, 2024 at 10:06 AM Yu Zhao <yuzhao@...gle.com> wrote:
> >
> > On Fri, Jul 19, 2024 at 2:44 AM Oliver Sang <oliver.sang@...el.com> wrote:
> > >
> > > hi, Yu Zhao,
> > >
> > > On Wed, Jul 17, 2024 at 09:44:33AM -0600, Yu Zhao wrote:
> > > > On Wed, Jul 17, 2024 at 2:36 AM Yu Zhao <yuzhao@...gle.com> wrote:
> > > > >
> > > > > Hi Janosch and Oliver,
> > > > >
> > > > > On Wed, Jul 17, 2024 at 1:57 AM Janosch Frank <frankja@...ux.ibm.com> wrote:
> > > > > >
> > > > > > On 7/9/24 07:11, kernel test robot wrote:
> > > > > > > Hello,
> > > > > > >
> > > > > > > kernel test robot noticed a -34.3% regression of vm-scalability.throughput on:
> > > > > > >
> > > > > > >
> > > > > > > commit: 875fa64577da9bc8e9963ee14fef8433f20653e7 ("mm/hugetlb_vmemmap: fix race with speculative PFN walkers")
> > > > > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > > > > > >
> > > > > > > [still regression on linux-next/master 0b58e108042b0ed28a71cd7edf5175999955b233]
> > > > > > >
> > > > > > This has hit s390 huge page backed KVM guests as well.
> > > > > > Our simple start/stop test case went from ~5 to over 50 seconds of runtime.
> > > > >
> > > > > Could you try the attached patch please? Thank you.
> > > >
> > > > Thanks, Yosry, for spotting the following typo:
> > > >   flags &= VMEMMAP_SYNCHRONIZE_RCU;
> > > > It's supposed to be:
> > > >   flags &= ~VMEMMAP_SYNCHRONIZE_RCU;
> > > >
> > > > Reattaching v2 with the above typo fixed. Please let me know, Janosch & Oliver.
> > >
> > > since the commit is in mainline now, I directly apply your v2 patch upon
> > > bd225530a4c71 ("mm/hugetlb_vmemmap: fix race with speculative PFN walkers")
> > >
> > > in our tests, your v2 patch not only recovers the performance regression,
> >
> > Thanks for verifying the fix!
> >
> > > it even has +13.7% performance improvement than 5a4d8944d6b1e (parent of
> > > bd225530a4c71)
> >
> > Glad to hear!
> >
> > (The original patch improved and regressed the performance at the same
> > time, but the regression is bigger. The fix removed the regression and
> > surfaced the improvement.)
> 
> Can you please run the benchmark again with the attached patch on top
> of the last fix?

last time, I applied your last fix (1)  directly upon mainline commit (2)

9a5b87b521401 fix for 875fa64577 (then bd225530a4 in main)               <--- (1)
bd225530a4c71 mm/hugetlb_vmemmap: fix race with speculative PFN walkers  <--- (2)

but I failed to apply your patch this time upon (1)

then I found I can apply above (1) upon mainline commit (3), as below (4).
your patch this time can be applied upon (4) successfully, as below (5)

e2b8dff50992a new hugetlb-20240805.patch                                       <--- (5)
b5af188232e56 v2 fix for bd225530a4 but apply on mainline tip 17712b7ea0756    <--- (4)
17712b7ea0756 Merge tag 'io_uring-6.11-20240802' of git://git.kernel.dk/linux  <--- (3)

I tested (3)(4)(5) and compared them with bd225530a4c71 and its parent. detail
as below [1]

you may notice the data for bd225530a4c71 and its parent are different with
previous data. this is due to we found some problem for gcc-13, we convert
to use gcc-12 now, our config is also changed.

we have below observations.
* bd225530a4c71 still has a similar -36.6% regression compare to its parent
* 17712b7ea0756 has similar data as bd225530a4c71 (a little worse, so -39.2%
  comparing to 5a4d8944d6b1e who is parent of bd225530a4c71)
* your last fix still do the work to recover the regression, but is not better
  than 5a4d8944d6b1e
* your patch this time seems not impact performance data a lot


[1]

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/300s/512G/lkp-icl-2sp2/anon-cow-rand-hugetlb/vm-scalability

commit:
  5a4d8944d6b1e ("cachestat: do not flush stats in recency check")
  bd225530a4c71 ("mm/hugetlb_vmemmap: fix race with speculative PFN walkers")
  17712b7ea0756 ("Merge tag 'io_uring-6.11-20240802' of git://git.kernel.dk/linux")
  b5af188232e56  <--- apply your last fix upon 17712b7ea0756
  e2b8dff50992a  <--- then apply your patch this time upon b5af188232e56

5a4d8944d6b1e1aa bd225530a4c717714722c373144 17712b7ea0756799635ba159cc7 b5af188232e564d17fc3c1784f7 e2b8dff50992a56c67308f905bd
---------------- --------------------------- --------------------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \          |                \          |                \
 3.312e+09 ± 34%    +472.2%  1.895e+10 ±  3%    +487.2%  1.945e+10 ±  7%      -4.4%  3.167e+09 ± 29%     -15.6%  2.795e+09 ± 29%  cpuidle..time
    684985 ±  5%   +1112.3%    8304355 ±  2%   +1099.5%    8216278            -2.4%     668573 ±  5%      -5.2%     649406 ±  2%  cpuidle..usage
    231.53 ±  3%     +40.7%     325.70 ±  2%     +45.1%     335.98 ±  5%      +1.4%     234.78 ±  4%      +0.2%     231.94 ±  3%  uptime.boot
     10015 ± 10%    +156.8%      25723 ±  4%    +156.8%      25724 ±  7%      +4.3%      10447 ± 12%      -0.7%       9945 ± 13%  uptime.idle
    577860 ±  7%     +18.1%     682388 ±  8%     +12.1%     647808 ±  6%      +9.9%     635341 ±  6%      -0.3%     576189 ±  5%  numa-numastat.node0.local_node
    624764 ±  5%     +16.1%     725128 ±  4%     +18.0%     736975 ±  2%     +10.2%     688399 ±  2%      +3.0%     643587 ±  5%  numa-numastat.node0.numa_hit
    647823 ±  5%     +11.3%     721266 ±  9%     +15.7%     749411 ±  6%     -10.0%     583278 ±  5%      -1.0%     641117 ±  3%  numa-numastat.node1.local_node
    733550 ±  4%     +10.6%     811157 ±  4%      +8.4%     795091 ±  3%      -9.0%     667814 ±  3%      -3.4%     708807 ±  3%  numa-numastat.node1.numa_hit
      6.17 ±108%   +1521.6%     100.00 ± 38%  +26137.8%       1618 ±172%     -74.1%       1.60 ± 84%     +27.0%       7.83 ±114%  perf-c2c.DRAM.local
     46.17 ± 43%   +2759.6%       1320 ± 26%  +12099.6%       5632 ±112%     +18.3%      54.60 ± 56%     +48.7%      68.67 ± 42%  perf-c2c.DRAM.remote
     36.50 ± 52%   +1526.5%     593.67 ± 26%   +1305.5%     513.00 ± 53%      +2.5%      37.40 ± 46%     +62.6%      59.33 ± 66%  perf-c2c.HITM.local
     15.33 ± 74%   +2658.7%     423.00 ± 36%   +2275.0%     364.17 ± 67%     +48.7%      22.80 ± 75%    +122.8%      34.17 ± 58%  perf-c2c.HITM.remote
     15.34 ± 27%    +265.8%      56.12          +256.0%      54.63            -2.5%      14.96 ± 23%     -12.7%      13.39 ± 23%  vmstat.cpu.id
     73.93 ±  5%     -41.4%      43.30 ±  2%     -39.3%      44.85 ±  2%      +0.5%      74.27 ±  4%      +2.4%      75.72 ±  3%  vmstat.cpu.us
    110.76 ±  4%     -47.2%      58.47 ±  2%     -45.7%      60.14 ±  2%      +0.1%     110.90 ±  4%      +1.9%     112.84 ±  3%  vmstat.procs.r
      2729 ±  3%    +167.3%       7294 ±  2%    +155.7%       6979 ±  4%      +0.2%       2734            -1.3%       2692 ±  5%  vmstat.system.cs
    150274 ±  5%     -23.2%     115398 ±  6%     -27.2%     109377 ± 13%      +0.6%     151130 ±  4%      +0.9%     151666 ±  3%  vmstat.system.in
     14.31 ± 29%     +41.4       55.74           +40.0       54.31            -0.5       13.85 ± 25%      -1.9       12.42 ± 24%  mpstat.cpu.all.idle%
      0.34 ±  5%      -0.1        0.21 ±  2%      -0.1        0.21 ±  2%      -0.0        0.34 ±  4%      +0.0        0.35 ±  4%  mpstat.cpu.all.irq%
      0.02 ±  4%      +0.0        0.03            +0.0        0.03 ±  4%      -0.0        0.02 ±  2%      -0.0        0.02 ±  2%  mpstat.cpu.all.soft%
     10.63 ±  4%     -10.2        0.43 ±  4%     -10.3        0.35 ± 29%      +0.1       10.71 ±  2%      +0.2       10.79 ±  4%  mpstat.cpu.all.sys%
     74.69 ±  5%     -31.1       43.59 ±  2%     -29.6       45.10 ±  2%      +0.4       75.08 ±  4%      +1.7       76.42 ±  3%  mpstat.cpu.all.usr%
      6.83 ± 15%    +380.5%      32.83 ± 45%    +217.1%      21.67 ±  5%     +40.5%       9.60 ± 41%      -7.3%       6.33 ±  7%  mpstat.max_utilization.seconds
      0.71 ± 55%      +0.4        1.14 ±  3%      +0.2        0.96 ± 44%      +0.4        1.09 ±  4%      +0.2        0.91 ± 30%  perf-profile.calltrace.cycles-pp.lrand48_r
     65.57 ± 10%      +3.5       69.09            -7.3       58.23 ± 45%      +2.4       67.94            +7.2       72.76        perf-profile.calltrace.cycles-pp.do_rw_once
      0.06 ±  7%      -0.0        0.05 ± 46%      +0.0        0.11 ± 48%      +0.0        0.07 ± 16%      +0.0        0.08 ± 16%  perf-profile.children.cycles-pp.get_jiffies_update
      0.28 ± 10%      +0.0        0.29 ±  8%      +0.3        0.58 ± 74%      +0.0        0.30 ± 13%      +0.0        0.32 ± 12%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.24 ± 10%      +0.0        0.25 ± 10%      +0.2        0.46 ± 66%      +0.0        0.26 ± 13%      +0.0        0.28 ± 12%  perf-profile.children.cycles-pp.update_process_times
      0.06 ±  7%      -0.0        0.05 ± 46%      +0.0        0.11 ± 48%      +0.0        0.07 ± 16%      +0.0        0.08 ± 16%  perf-profile.self.cycles-pp.get_jiffies_update
      0.50 ±  7%      +0.1        0.56 ±  3%      -0.0        0.46 ± 44%      +0.0        0.53 ±  3%      +0.0        0.52 ±  6%  perf-profile.self.cycles-pp.lrand48_r@plt
     26722 ±  4%     -33.8%      17690 ± 15%     -40.0%      16038 ± 29%      +6.9%      28560 ±  3%      +9.0%      29116 ±  3%  numa-meminfo.node0.HugePages_Surp
     26722 ±  4%     -33.8%      17690 ± 15%     -40.0%      16038 ± 29%      +6.9%      28560 ±  3%      +9.0%      29116 ±  3%  numa-meminfo.node0.HugePages_Total
  74013758 ±  3%     +24.7%   92302659 ±  5%     +30.0%   96190384 ±  9%      -5.6%   69852204 ±  2%      -6.0%   69592735 ±  3%  numa-meminfo.node0.MemFree
  57671194 ±  4%     -31.7%   39382292 ± 12%     -38.5%   35494567 ± 26%      +7.2%   61832747 ±  3%      +7.7%   62092216 ±  3%  numa-meminfo.node0.MemUsed
     84822 ± 19%     +57.1%     133225 ± 17%     +13.5%      96280 ± 39%      -4.4%      81114 ±  9%      -6.0%      79743 ± 11%  numa-meminfo.node1.Active
     84781 ± 19%     +57.1%     133211 ± 17%     +13.5%      96254 ± 39%      -4.4%      81091 ±  9%      -6.0%      79729 ± 11%  numa-meminfo.node1.Active(anon)
  78416592 ±  7%     +13.3%   88860070 ±  5%      +6.7%   83660976 ± 11%      +4.5%   81951764 ±  4%      +2.4%   80309519 ±  3%  numa-meminfo.node1.MemFree
  53641607 ± 11%     -19.5%   43198129 ± 11%      -9.8%   48397199 ± 19%      -6.6%   50106411 ±  7%      -3.5%   51748656 ±  4%  numa-meminfo.node1.MemUsed
  18516537 ±  3%     +24.7%   23084190 ±  5%     +29.9%   24053750 ±  9%      -5.6%   17484374 ±  3%      -6.1%   17387753 ±  2%  numa-vmstat.node0.nr_free_pages
    624065 ±  5%     +16.0%     724171 ±  4%     +18.0%     736399 ±  2%     +10.1%     687335 ±  2%      +3.0%     642802 ±  5%  numa-vmstat.node0.numa_hit
    577161 ±  8%     +18.1%     681431 ±  8%     +12.1%     647232 ±  6%      +9.9%     634277 ±  6%      -0.3%     575404 ±  5%  numa-vmstat.node0.numa_local
     21141 ± 19%     +57.4%      33269 ± 17%     +13.7%      24027 ± 39%      -4.2%      20242 ±  9%      -5.6%      19967 ± 11%  numa-vmstat.node1.nr_active_anon
  19586357 ±  7%     +13.5%   22224344 ±  5%      +6.8%   20914089 ± 11%      +4.6%   20487157 ±  4%      +2.6%   20087311 ±  3%  numa-vmstat.node1.nr_free_pages
     21141 ± 19%     +57.4%      33269 ± 17%     +13.7%      24027 ± 39%      -4.2%      20242 ±  9%      -5.6%      19967 ± 11%  numa-vmstat.node1.nr_zone_active_anon
    732629 ±  4%     +10.5%     809596 ±  4%      +8.4%     793911 ±  3%      -9.0%     666417 ±  3%      -3.5%     707191 ±  3%  numa-vmstat.node1.numa_hit
    646902 ±  5%     +11.3%     719705 ±  9%     +15.7%     748231 ±  6%     -10.1%     581882 ±  5%      -1.1%     639501 ±  3%  numa-vmstat.node1.numa_local
    167.87 ±  2%     +56.6%     262.93           +65.3%     277.49 ±  3%      -1.9%     164.74            -1.8%     164.84        time.elapsed_time
    167.87 ±  2%     +56.6%     262.93           +65.3%     277.49 ±  3%      -1.9%     164.74            -1.8%     164.84        time.elapsed_time.max
    140035 ±  6%     -50.5%      69271 ±  5%     -50.8%      68889 ±  4%      -5.9%     131759 ±  3%      -1.2%     138362 ±  8%  time.involuntary_context_switches
    163.67 ± 10%     +63.7%     268.00 ±  5%     +76.5%     288.83 ±  5%     +13.0%     185.00 ±  8%     +22.4%     200.33 ±  4%  time.major_page_faults
     11308 ±  2%     -48.4%       5830           -47.0%       5995            +1.8%      11514            +2.5%      11591        time.percent_of_cpu_this_job_got
      2347           -94.0%     139.98 ±  3%     -95.0%     117.34 ± 21%      +0.3%       2354            -0.0%       2347        time.system_time
     16627            -8.6%      15191            -0.6%      16529 ±  5%      -0.1%      16616            +0.8%      16759 ±  2%  time.user_time
     12158 ±  2%   +5329.5%     660155         +5325.1%     659615            -1.0%      12037 ±  3%      -1.4%      11985 ±  3%  time.voluntary_context_switches
     59662           -37.0%      37607           -40.5%      35489 ±  4%      +0.5%      59969            -0.1%      59610        vm-scalability.median
      2.19 ± 20%      +1.7        3.91 ± 30%      +3.3        5.51 ± 30%      +0.6        2.82 ± 23%      +1.5        3.72 ± 25%  vm-scalability.median_stddev%
      2.92 ± 22%      +0.6        3.49 ± 32%      +1.5        4.45 ± 19%      +0.4        3.35 ± 17%      +1.5        4.39 ± 16%  vm-scalability.stddev%
   7821791           -36.6%    4961402           -39.2%    4758850 ±  2%      -0.2%    7809010            -0.7%    7769662        vm-scalability.throughput
    167.87 ±  2%     +56.6%     262.93           +65.3%     277.49 ±  3%      -1.9%     164.74            -1.8%     164.84        vm-scalability.time.elapsed_time
    167.87 ±  2%     +56.6%     262.93           +65.3%     277.49 ±  3%      -1.9%     164.74            -1.8%     164.84        vm-scalability.time.elapsed_time.max
    140035 ±  6%     -50.5%      69271 ±  5%     -50.8%      68889 ±  4%      -5.9%     131759 ±  3%      -1.2%     138362 ±  8%  vm-scalability.time.involuntary_context_switches
     11308 ±  2%     -48.4%       5830           -47.0%       5995            +1.8%      11514            +2.5%      11591        vm-scalability.time.percent_of_cpu_this_job_got
      2347           -94.0%     139.98 ±  3%     -95.0%     117.34 ± 21%      +0.3%       2354            -0.0%       2347        vm-scalability.time.system_time
     16627            -8.6%      15191            -0.6%      16529 ±  5%      -0.1%      16616            +0.8%      16759 ±  2%  vm-scalability.time.user_time
     12158 ±  2%   +5329.5%     660155         +5325.1%     659615            -1.0%      12037 ±  3%      -1.4%      11985 ±  3%  vm-scalability.time.voluntary_context_switches
     88841 ± 18%     +56.6%     139142 ± 16%     +18.6%     105352 ± 34%      -3.1%      86098 ±  9%      -6.8%      82770 ± 11%  meminfo.Active
     88726 ± 18%     +56.7%     139024 ± 16%     +18.6%     105233 ± 34%      -3.1%      85984 ±  9%      -6.8%      82654 ± 11%  meminfo.Active(anon)
  79226777 ±  3%     +18.1%   93562456           +17.2%   92853282            -0.3%   78961619 ±  2%      -1.5%   78023229 ±  2%  meminfo.CommitLimit
     51410 ±  5%     -27.2%      37411 ±  2%     -25.9%      38103 ±  2%      +0.5%      51669 ±  4%      +2.3%      52586 ±  3%  meminfo.HugePages_Surp
     51410 ±  5%     -27.2%      37411 ±  2%     -25.9%      38103 ±  2%      +0.5%      51669 ±  4%      +2.3%      52586 ±  3%  meminfo.HugePages_Total
 1.053e+08 ±  5%     -27.2%   76618243 ±  2%     -25.9%   78036556 ±  2%      +0.5%  1.058e+08 ±  4%      +2.3%  1.077e+08 ±  3%  meminfo.Hugetlb
     59378 ±  9%     -27.2%      43256 ±  9%     -29.4%      41897 ± 15%      -3.0%      57584 ±  9%      -1.5%      58465 ±  8%  meminfo.Mapped
 1.513e+08 ±  3%     +19.0%  1.801e+08           +18.1%  1.787e+08            -0.3%  1.508e+08 ±  3%      -1.6%  1.489e+08 ±  2%  meminfo.MemAvailable
 1.523e+08 ±  3%     +18.9%  1.811e+08           +18.0%  1.798e+08            -0.3%  1.518e+08 ±  3%      -1.6%  1.499e+08 ±  2%  meminfo.MemFree
 1.114e+08 ±  4%     -25.8%   82607720 ±  2%     -24.6%   83956777 ±  2%      +0.5%  1.119e+08 ±  4%      +2.2%  1.138e+08 ±  3%  meminfo.Memused
     10914 ±  2%      -9.3%       9894            -9.0%       9935            +0.8%      10999 ±  2%      +1.3%      11059        meminfo.PageTables
    235415 ±  4%     +17.2%     275883 ±  9%      +2.4%     241001 ± 17%      -1.9%     230929 ±  2%      -2.2%     230261 ±  4%  meminfo.Shmem
     22170 ± 18%     +57.0%      34801 ± 17%     +18.9%      26361 ± 34%      -2.6%      21594 ±  9%      -6.6%      20698 ± 11%  proc-vmstat.nr_active_anon
   3774988 ±  3%     +19.0%    4493004           +18.2%    4461258            -0.3%    3762775 ±  2%      -1.7%    3712537 ±  2%  proc-vmstat.nr_dirty_background_threshold
   7559208 ±  3%     +19.0%    8996995           +18.2%    8933426            -0.3%    7534750 ±  2%      -1.7%    7434153 ±  2%  proc-vmstat.nr_dirty_threshold
    824427            +1.2%     834568            +0.3%     826777            -0.0%     824269            -0.0%     824023        proc-vmstat.nr_file_pages
  38091344 ±  3%     +18.9%   45280310           +18.0%   44962412            -0.3%   37969040 ±  2%      -1.6%   37466065 ±  2%  proc-vmstat.nr_free_pages
     25681            -1.7%      25241            -1.6%      25268            +0.9%      25908            -0.1%      25665        proc-vmstat.nr_kernel_stack
     15161 ±  9%     -28.5%      10841 ±  9%     -30.3%      10565 ± 14%      -3.4%      14641 ±  9%      -2.1%      14849 ±  7%  proc-vmstat.nr_mapped
      2729 ±  2%      -9.4%       2473            -9.1%       2480            +0.7%       2748 ±  2%      +1.2%       2762        proc-vmstat.nr_page_table_pages
     58775 ±  4%     +17.3%      68926 ±  9%      +2.5%      60274 ± 18%      -1.8%      57736 ±  2%      -2.1%      57526 ±  4%  proc-vmstat.nr_shmem
     22170 ± 18%     +57.0%      34801 ± 17%     +18.9%      26361 ± 34%      -2.6%      21594 ±  9%      -6.6%      20698 ± 11%  proc-vmstat.nr_zone_active_anon
   1360860           +13.0%    1537181           +12.7%    1533834            -0.2%    1357949            -0.5%    1354233        proc-vmstat.numa_hit
   1228230           +14.4%    1404550           +13.9%    1398987            -0.6%    1220355            -0.7%    1219146        proc-vmstat.numa_local
    132626            +0.0%     132681            +1.7%     134822            +3.7%     137582 ±  4%      +1.9%     135086        proc-vmstat.numa_other
   1186558           +18.1%    1400807           +19.5%    1417837            -0.3%    1182763            -0.5%    1180560        proc-vmstat.pgfault
     31861 ±  3%     +28.2%      40847           +31.7%      41945 ±  5%      -3.1%      30881 ±  3%      -1.7%      31316 ±  4%  proc-vmstat.pgreuse
     17.18 ±  3%    +337.2%      75.11 ±  2%    +318.3%      71.87 ±  5%      -1.3%      16.96 ±  4%      -0.0%      17.18 ±  3%  perf-stat.i.MPKI
 1.727e+10 ±  5%     -37.8%  1.073e+10 ±  2%     -41.2%  1.015e+10 ±  6%      +0.7%  1.738e+10 ±  3%      +1.7%  1.757e+10 ±  4%  perf-stat.i.branch-instructions
      0.12 ± 36%      +0.6        0.73 ±  5%      +0.7        0.79 ±  6%      +0.0        0.12 ± 27%      -0.0        0.11 ± 32%  perf-stat.i.branch-miss-rate%
  10351997 ± 16%     -28.0%    7451909 ± 13%     -29.7%    7276965 ± 16%     -10.0%    9315546 ± 22%      -7.3%    9592438 ± 25%  perf-stat.i.branch-misses
     94.27 ±  3%     -20.3       73.99 ±  2%     -19.2       75.03            -0.8       93.49 ±  3%      +0.3       94.60 ±  3%  perf-stat.i.cache-miss-rate%
   9.7e+08 ±  5%     -39.6%  5.859e+08 ±  2%     -42.8%  5.552e+08 ±  5%      +0.6%  9.759e+08 ±  3%      +1.6%  9.854e+08 ±  4%  perf-stat.i.cache-misses
 9.936e+08 ±  5%     -35.3%  6.431e+08 ±  2%     -38.8%  6.084e+08 ±  5%      +0.5%   9.99e+08 ±  3%      +1.5%  1.008e+09 ±  4%  perf-stat.i.cache-references
      2640 ±  3%    +180.7%       7410 ±  2%    +168.8%       7097 ±  4%      -0.0%       2640            -1.5%       2601 ±  5%  perf-stat.i.context-switches
      4.60 ±  2%     +22.2%       5.62           +18.1%       5.44 ±  5%      -1.0%       4.56 ±  2%      +0.5%       4.62        perf-stat.i.cpi
 2.888e+11 ±  5%     -47.9%  1.503e+11 ±  2%     -46.8%  1.538e+11 ±  2%      +0.6%  2.907e+11 ±  4%      +2.4%  2.956e+11 ±  3%  perf-stat.i.cpu-cycles
    214.97 ±  3%     +48.6%     319.40 ±  2%     +50.3%     323.15            +0.3%     215.56            +0.9%     216.91        perf-stat.i.cpu-migrations
   7.4e+10 ±  5%     -37.6%  4.618e+10 ±  2%     -41.0%  4.369e+10 ±  6%      +0.7%  7.449e+10 ±  3%      +1.7%  7.529e+10 ±  4%  perf-stat.i.instructions
      0.28 ±  7%     +33.6%       0.38 ±  3%     +31.5%       0.37 ±  2%      +0.0%       0.28 ±  6%      -2.7%       0.27 ±  5%  perf-stat.i.ipc
      6413 ±  4%     -21.5%       5037           -24.5%       4839 ±  5%      -0.2%       6397 ±  4%      +0.8%       6464 ±  2%  perf-stat.i.minor-faults
      6414 ±  4%     -21.5%       5038           -24.5%       4840 ±  5%      -0.3%       6398 ±  4%      +0.8%       6465 ±  2%  perf-stat.i.page-faults
     13.16            -4.0%      12.64            -3.9%      12.64            +0.0%      13.17            +0.1%      13.17        perf-stat.overall.MPKI
     97.57            -6.3       91.24            -6.1       91.44            +0.1       97.64            +0.1       97.67        perf-stat.overall.cache-miss-rate%
      3.91           -16.9%       3.25            -9.8%       3.53 ±  5%      -0.0%       3.91            +0.7%       3.94        perf-stat.overall.cpi
    296.89           -13.4%     257.07            -6.1%     278.90 ±  5%      -0.1%     296.69            +0.7%     298.84        perf-stat.overall.cycles-between-cache-misses
      0.26           +20.3%       0.31           +11.1%       0.28 ±  5%      +0.0%       0.26            -0.7%       0.25        perf-stat.overall.ipc
     10770            -2.2%      10537            -2.3%      10523            +0.2%      10788            +0.1%      10784        perf-stat.overall.path-length
   1.7e+10 ±  4%     -36.8%  1.074e+10 ±  2%     -39.8%  1.023e+10 ±  5%      +0.6%  1.711e+10 ±  3%      +1.6%  1.727e+10 ±  4%  perf-stat.ps.branch-instructions
  10207074 ± 15%     -27.2%    7428222 ± 13%     -29.6%    7182646 ± 16%      -9.7%    9221719 ± 22%      -6.6%    9530095 ± 25%  perf-stat.ps.branch-misses
 9.588e+08 ±  4%     -39.1%  5.838e+08           -42.0%  5.566e+08 ±  5%      +0.7%  9.651e+08 ±  3%      +1.6%  9.744e+08 ±  4%  perf-stat.ps.cache-misses
 9.826e+08 ±  4%     -34.9%  6.398e+08           -38.1%  6.087e+08 ±  5%      +0.6%  9.884e+08 ±  3%      +1.5%  9.975e+08 ±  4%  perf-stat.ps.cache-references
      2628 ±  3%    +176.7%       7271 ±  2%    +164.7%       6956 ±  4%      +0.3%       2635            -1.0%       2600 ±  5%  perf-stat.ps.context-switches
 2.847e+11 ±  4%     -47.3%  1.501e+11 ±  2%     -45.6%  1.548e+11 ±  2%      +0.6%  2.864e+11 ±  4%      +2.3%  2.911e+11 ±  3%  perf-stat.ps.cpu-cycles
    213.42 ±  3%     +47.5%     314.87 ±  2%     +49.2%     318.34            +0.5%     214.42            +1.3%     216.10        perf-stat.ps.cpu-migrations
 7.284e+10 ±  4%     -36.6%   4.62e+10 ±  2%     -39.6%  4.402e+10 ±  5%      +0.6%   7.33e+10 ±  3%      +1.6%  7.398e+10 ±  4%  perf-stat.ps.instructions
      6416 ±  3%     -22.4%       4976           -25.6%       4772 ±  5%      +0.2%       6426 ±  3%      +1.6%       6516 ±  2%  perf-stat.ps.minor-faults
      6417 ±  3%     -22.4%       4977           -25.6%       4774 ±  5%      +0.2%       6428 ±  3%      +1.6%       6517 ±  2%  perf-stat.ps.page-faults
 1.268e+13            -2.2%  1.241e+13            -2.3%  1.239e+13            +0.2%   1.27e+13            +0.1%   1.27e+13        perf-stat.total.instructions
   7783325 ± 13%     -22.8%    6008522 ± 10%     -20.8%    6163644 ± 20%     -13.8%    6708575 ± 22%      -4.5%    7429947 ± 26%  sched_debug.cfs_rq:/.avg_vruntime.avg
   8109328 ± 13%     -18.8%    6584206 ± 10%     -15.3%    6872509 ± 19%     -14.2%    6957983 ± 22%      -5.4%    7673718 ± 26%  sched_debug.cfs_rq:/.avg_vruntime.max
    244161 ± 30%     +28.2%     313090 ± 22%     +76.6%     431126 ± 21%     -23.5%     186903 ± 26%     -28.7%     173977 ± 29%  sched_debug.cfs_rq:/.avg_vruntime.stddev
      0.66 ± 11%     -22.0%       0.52 ± 21%     -41.3%       0.39 ± 29%      -0.1%       0.66 ±  8%      -3.5%       0.64 ± 16%  sched_debug.cfs_rq:/.h_nr_running.avg
    495.88 ± 33%     -44.7%     274.12 ±  3%     -11.5%     438.85 ± 32%     -11.2%     440.30 ± 18%     -12.2%     435.24 ± 27%  sched_debug.cfs_rq:/.load_avg.max
     81.79 ± 28%     -33.2%      54.62 ± 16%     -15.5%      69.10 ± 26%      +7.2%      87.66 ± 23%      -8.4%      74.91 ± 38%  sched_debug.cfs_rq:/.load_avg.stddev
   7783325 ± 13%     -22.8%    6008522 ± 10%     -20.8%    6163644 ± 20%     -13.8%    6708575 ± 22%      -4.5%    7429947 ± 26%  sched_debug.cfs_rq:/.min_vruntime.avg
   8109328 ± 13%     -18.8%    6584206 ± 10%     -15.3%    6872509 ± 19%     -14.2%    6957983 ± 22%      -5.4%    7673718 ± 26%  sched_debug.cfs_rq:/.min_vruntime.max
    244161 ± 30%     +28.2%     313090 ± 22%     +76.6%     431126 ± 21%     -23.5%     186902 ± 26%     -28.7%     173977 ± 29%  sched_debug.cfs_rq:/.min_vruntime.stddev
      0.66 ± 11%     -22.3%       0.51 ± 21%     -41.5%       0.38 ± 29%      -0.4%       0.66 ±  8%      -3.8%       0.63 ± 16%  sched_debug.cfs_rq:/.nr_running.avg
    382.00 ± 36%     -44.2%     213.33 ±  8%     -23.3%     292.98 ± 42%      -2.3%     373.40 ± 18%      +4.0%     397.33 ± 20%  sched_debug.cfs_rq:/.removed.load_avg.max
    194.86 ± 36%     -44.3%     108.59 ±  8%     -24.0%     148.18 ± 40%      -2.4%     190.23 ± 18%      +3.7%     202.10 ± 20%  sched_debug.cfs_rq:/.removed.runnable_avg.max
    194.86 ± 36%     -44.3%     108.59 ±  8%     -24.0%     148.18 ± 40%      -2.4%     190.23 ± 18%      +3.7%     202.10 ± 20%  sched_debug.cfs_rq:/.removed.util_avg.max
    713.50 ± 11%     -22.6%     552.00 ± 20%     -39.7%     430.54 ± 26%      -0.2%     712.27 ±  7%      -3.0%     691.86 ± 14%  sched_debug.cfs_rq:/.runnable_avg.avg
      1348 ± 10%     -15.9%       1133 ± 15%     -20.9%       1067 ± 12%      +2.4%       1380 ±  8%      +5.1%       1417 ± 18%  sched_debug.cfs_rq:/.runnable_avg.max
    708.60 ± 11%     -22.6%     548.41 ± 20%     -39.6%     427.82 ± 26%      -0.1%     707.59 ±  7%      -3.1%     686.34 ± 14%  sched_debug.cfs_rq:/.util_avg.avg
      1119 ±  5%     -16.3%     937.08 ± 11%     -18.6%     910.83 ± 11%      +2.0%       1141 ±  6%      -0.1%       1117 ±  8%  sched_debug.cfs_rq:/.util_avg.max
    633.71 ± 11%     -95.7%      27.38 ± 17%     -96.7%      21.00 ± 19%      -0.6%     630.15 ± 10%      -3.6%     610.78 ± 17%  sched_debug.cfs_rq:/.util_est.avg
      1102 ± 18%     -63.9%     397.88 ± 15%     -67.5%     358.19 ±  8%      +6.1%       1169 ± 14%      +6.6%       1174 ± 24%  sched_debug.cfs_rq:/.util_est.max
    119.77 ± 55%     -64.5%      42.46 ± 12%     -67.8%      38.59 ± 12%      -3.2%     115.93 ± 51%     -12.3%     105.01 ± 70%  sched_debug.cfs_rq:/.util_est.stddev
    145182 ± 12%     -37.6%      90551 ± 11%     -29.5%     102317 ± 18%      -7.3%     134528 ± 10%     -17.2%     120251 ± 18%  sched_debug.cpu.avg_idle.stddev
    122256 ±  8%     +41.4%     172906 ±  7%     +38.2%     168929 ± 14%      -5.4%     115642 ±  6%      -1.3%     120639 ± 14%  sched_debug.cpu.clock.avg
    122268 ±  8%     +41.4%     172920 ±  7%     +38.2%     168942 ± 14%      -5.4%     115657 ±  6%      -1.3%     120655 ± 14%  sched_debug.cpu.clock.max
    122242 ±  8%     +41.4%     172892 ±  7%     +38.2%     168914 ± 14%      -5.4%     115627 ±  6%      -1.3%     120621 ± 14%  sched_debug.cpu.clock.min
    121865 ±  8%     +41.5%     172490 ±  7%     +38.3%     168517 ± 14%      -5.4%     115298 ±  6%      -1.3%     120268 ± 14%  sched_debug.cpu.clock_task.avg
    122030 ±  8%     +41.5%     172681 ±  7%     +38.3%     168714 ± 14%      -5.4%     115451 ±  6%      -1.3%     120421 ± 14%  sched_debug.cpu.clock_task.max
    112808 ±  8%     +44.2%     162675 ±  7%     +41.0%     159006 ± 15%      -5.5%     106630 ±  7%      -1.1%     111604 ± 15%  sched_debug.cpu.clock_task.min
      5671 ±  6%     +24.6%       7069 ±  4%     +24.0%       7034 ±  8%      -7.2%       5261 ±  7%      -3.5%       5471 ± 10%  sched_debug.cpu.curr->pid.max
      0.00 ± 12%     +22.5%       0.00 ± 50%     +17.7%       0.00 ± 42%     +71.0%       0.00 ± 35%     +59.0%       0.00 ± 43%  sched_debug.cpu.next_balance.stddev
      0.66 ± 11%     -22.0%       0.51 ± 21%     -41.4%       0.39 ± 29%      -0.3%       0.66 ±  8%      -3.6%       0.64 ± 16%  sched_debug.cpu.nr_running.avg
      2659 ± 12%    +208.6%       8204 ±  7%    +192.0%       7763 ± 14%     -10.1%       2391 ± 11%      -6.2%       2493 ± 15%  sched_debug.cpu.nr_switches.avg
    679.31 ± 10%    +516.8%       4189 ± 14%    +401.6%       3407 ± 24%     -14.7%     579.50 ± 19%      -6.8%     633.18 ± 25%  sched_debug.cpu.nr_switches.min
      0.00 ±  9%  +12202.6%       0.31 ± 42%  +12627.8%       0.32 ± 37%     +67.0%       0.00 ± 50%     -34.8%       0.00 ± 72%  sched_debug.cpu.nr_uninterruptible.avg
    122243 ±  8%     +41.4%     172893 ±  7%     +38.2%     168916 ± 14%      -5.4%     115628 ±  6%      -1.3%     120623 ± 14%  sched_debug.cpu_clk
    120996 ±  8%     +41.9%     171660 ±  7%     +38.6%     167751 ± 15%      -5.4%     114462 ±  6%      -1.3%     119457 ± 14%  sched_debug.ktime
    123137 ±  8%     +41.1%     173805 ±  7%     +37.9%     169767 ± 14%      -5.4%     116479 ±  6%      -1.4%     121452 ± 13%  sched_debug.sched_clk


> 
> I spotted something else worth optimizing last time, and with the
> patch attached, I was able to measure some significant improvements in
> 1GB hugeTLB allocation and free time, e.g., when allocating and free
> 700 1GB hugeTLB pages:
> 
> Before:
>   # time echo 700 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
>   real  0m13.500s
>   user  0m0.000s
>   sys   0m13.311s
> 
>   # time echo 0 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
>   real  0m11.269s
>   user  0m0.000s
>   sys   0m11.187s
> 
> 
> After:
>   # time echo 700 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
>   real  0m10.643s
>   user  0m0.001s
>   sys   0m10.487s
> 
>   # time echo 0 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
>   real  0m1.541s
>   user  0m0.000s
>   sys   0m1.528s
> 
> Thanks!



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ