lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <1418804307.5745.252.camel@intel.com>
Date:	Wed, 17 Dec 2014 16:18:27 +0800
From:	Huang Ying <ying.huang@...el.com>
To:	Johannes Weiner <hannes@...xchg.org>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>, LKP ML <lkp@...org>
Subject: [LKP] [mm] 1306a85aed3: +5.8% will-it-scale.per_thread_ops

FYI, we noticed the below changes on

commit 1306a85aed3ec3db98945aafb7dfbe5648a1203c ("mm: embed the memcg pointer directly into struct page")


testbox/testcase/testparams: lkp-snb01/will-it-scale/performance-page_fault2

22811c6bc3c764d8  1306a85aed3ec3db98945aafb7  
----------------  --------------------------  
         %stddev     %change         %stddev
             \          |                \  
    185591 ±  0%      +5.8%     196339 ±  0%  will-it-scale.per_thread_ops
    268066 ±  0%      +4.2%     279258 ±  0%  will-it-scale.per_process_ops
     66204 ± 47%     -79.9%      13282 ±  6%  sched_debug.cpu#14.sched_count
       726 ± 12%    -100.0%          0 ±  0%  slabinfo.blkdev_requests.num_objs
       726 ± 12%    -100.0%          0 ±  0%  slabinfo.blkdev_requests.active_objs
       282 ± 11%     -86.2%         39 ±  0%  slabinfo.bdev_cache.num_objs
       282 ± 11%     -86.2%         39 ±  0%  slabinfo.bdev_cache.active_objs
       536 ± 10%     -92.7%         39 ±  0%  slabinfo.blkdev_ioc.num_objs
       536 ± 10%     -92.7%         39 ±  0%  slabinfo.blkdev_ioc.active_objs
       745 ± 13%     -93.0%         52 ± 34%  slabinfo.xfs_buf.num_objs
      1.35 ±  2%     -97.0%       0.04 ± 17%  perf-profile.cpu-cycles.mem_cgroup_page_lruvec.release_pages.free_pages_and_swap_cache.tlb_flush_mmu_free.unmap_page_range
     70832 ±  7%     -84.6%      10928 ±  0%  meminfo.DirectMap4k
       745 ± 13%     -93.0%         52 ± 34%  slabinfo.xfs_buf.active_objs
        20 ± 34%    +173.8%         54 ± 38%  sched_debug.cfs_rq[25]:/.runnable_load_avg
        21 ± 32%    +163.5%         56 ± 37%  sched_debug.cfs_rq[25]:/.load
        21 ± 32%    +163.5%         56 ± 37%  sched_debug.cpu#25.load
      6.68 ±  2%     -69.0%       2.07 ±  4%  perf-profile.cpu-cycles.lru_cache_add_active_or_unevictable.do_cow_fault.handle_mm_fault.__do_page_fault.do_page_fault
     11481 ± 40%     -60.4%       4550 ± 24%  sched_debug.cpu#31.sched_count
     35880 ± 29%     -54.4%      16355 ± 20%  sched_debug.cpu#8.sched_count
        30 ± 44%     +90.8%         57 ± 34%  sched_debug.cpu#25.cpu_load[0]
       258 ± 42%     -58.4%        107 ± 21%  sched_debug.cfs_rq[20]:/.blocked_load_avg
       615 ± 47%     -55.8%        271 ± 18%  sched_debug.cpu#22.ttwu_local
        24 ± 36%     +81.6%         44 ± 26%  sched_debug.cpu#25.cpu_load[1]
     31132 ± 41%     -47.8%      16259 ± 47%  sched_debug.cpu#13.sched_count
       287 ± 37%     -53.0%        135 ± 18%  sched_debug.cfs_rq[20]:/.tg_load_contrib
      2755 ± 22%     +79.7%       4950 ± 36%  sched_debug.cpu#8.ttwu_local
         9 ± 22%     +69.2%         16 ± 31%  sched_debug.cpu#14.cpu_load[0]
      8626 ± 14%     -46.4%       4621 ± 32%  sched_debug.cpu#0.ttwu_local
        37 ± 44%     -43.6%         21 ± 22%  sched_debug.cpu#31.cpu_load[1]
       390 ± 13%     -45.3%        213 ± 16%  sched_debug.cfs_rq[25]:/.blocked_load_avg
        14 ± 24%     -40.4%          8 ± 25%  sched_debug.cpu#13.cpu_load[0]
    309688 ± 24%     -44.8%     170966 ± 34%  sched_debug.cfs_rq[18]:/.spread0
       410 ± 13%     -34.6%        268 ±  7%  sched_debug.cfs_rq[25]:/.tg_load_contrib
        20 ± 30%     +64.6%         33 ± 17%  sched_debug.cpu#25.cpu_load[2]
    370117 ±  6%     -43.0%     210857 ± 45%  sched_debug.cfs_rq[17]:/.spread0
        28 ± 29%     -34.2%         18 ± 10%  sched_debug.cpu#31.cpu_load[2]
     16558 ± 28%     -40.9%       9784 ± 11%  sched_debug.cfs_rq[8]:/.exec_clock
      8517 ± 15%     -32.9%       5715 ±  9%  sched_debug.cpu#20.sched_count
      2301 ± 29%     +68.2%       3871 ± 17%  sched_debug.cpu#29.ttwu_count
        13 ± 17%     -35.8%          8 ± 26%  sched_debug.cfs_rq[13]:/.runnable_load_avg
      2317 ±  6%     -26.5%       1703 ± 18%  sched_debug.cpu#13.curr->pid
      2470 ± 12%     -23.3%       1893 ± 12%  sched_debug.cpu#15.curr->pid
        12 ± 14%     -28.0%          9 ±  7%  sched_debug.cpu#13.cpu_load[3]
    330696 ± 22%     -35.6%     212829 ±  5%  sched_debug.cfs_rq[8]:/.min_vruntime
        42 ± 38%     -43.8%         23 ± 15%  sched_debug.cpu#24.cpu_load[0]
      2556 ±  6%     +42.8%       3649 ±  9%  sched_debug.cpu#25.curr->pid
        33 ± 33%     -34.6%         21 ±  3%  sched_debug.cfs_rq[5]:/.load
        33 ± 33%     -33.1%         22 ±  7%  sched_debug.cpu#5.load
      3595 ± 17%     -25.0%       2697 ±  5%  sched_debug.cpu#17.ttwu_count
     24718 ± 15%     -27.3%      17972 ± 13%  sched_debug.cpu#0.nr_switches
        18 ± 25%     +45.2%         26 ± 10%  sched_debug.cpu#25.cpu_load[3]
      7788 ± 16%     -24.8%       5857 ±  5%  sched_debug.cpu#17.nr_switches
        17 ± 12%     +31.4%         23 ± 17%  sched_debug.cpu#1.cpu_load[3]
        18 ± 10%     +33.3%         24 ± 16%  sched_debug.cpu#1.cpu_load[2]
      6091 ±  5%     -26.8%       4460 ± 25%  sched_debug.cpu#31.nr_switches
      3956 ± 15%     -28.8%       2816 ± 16%  sched_debug.cpu#31.ttwu_count
      4.82 ±  1%     -24.3%       3.65 ±  1%  perf-profile.cpu-cycles.release_pages.free_pages_and_swap_cache.tlb_flush_mmu_free.unmap_page_range.unmap_single_vma
        13 ±  9%     -26.9%          9 ± 11%  sched_debug.cpu#13.cpu_load[2]
      3327 ± 11%     -20.2%       2655 ± 11%  sched_debug.cpu#4.curr->pid
      4.91 ±  1%     -23.8%       3.74 ±  1%  perf-profile.cpu-cycles.tlb_flush_mmu_free.unmap_page_range.unmap_single_vma.unmap_vmas.unmap_region
      4.91 ±  1%     -23.7%       3.74 ±  1%  perf-profile.cpu-cycles.free_pages_and_swap_cache.tlb_flush_mmu_free.unmap_page_range.unmap_single_vma.unmap_vmas
        36 ±  8%     -22.9%         27 ±  7%  sched_debug.cpu#17.cpu_load[0]
      1.74 ±  2%     -22.8%       1.34 ±  2%  perf-profile.cpu-cycles.unlock_page.do_cow_fault.handle_mm_fault.__do_page_fault.do_page_fault
        17 ± 21%     +33.8%         22 ±  7%  sched_debug.cpu#25.cpu_load[4]
    347045 ±  0%     -20.8%     274703 ±  0%  meminfo.Inactive(file)
     86761 ±  0%     -20.8%      68676 ±  0%  proc-vmstat.nr_inactive_file
     42941 ±  0%     -20.7%      34065 ±  1%  numa-vmstat.node0.nr_inactive_file
    171765 ±  0%     -20.7%     136260 ±  1%  numa-meminfo.node0.Inactive(file)
    175280 ±  0%     -21.0%     138443 ±  1%  numa-meminfo.node1.Inactive(file)
     43819 ±  0%     -21.0%      34611 ±  1%  numa-vmstat.node1.nr_inactive_file
     14245 ± 13%     -28.8%      10144 ± 18%  sched_debug.cpu#0.ttwu_count
     34770 ± 14%     +29.3%      44960 ± 18%  sched_debug.cfs_rq[1]:/.exec_clock
      1.23 ±  1%     +23.8%       1.52 ±  2%  perf-profile.cpu-cycles._raw_spin_lock.get_page_from_freelist.__alloc_pages_nodemask.alloc_pages_vma.do_cow_fault
        17 ± 21%     +23.5%         21 ±  7%  sched_debug.cpu#29.cpu_load[3]
        32 ±  5%     -12.2%         28 ±  8%  sched_debug.cpu#21.cpu_load[1]
        35 ±  9%     -19.1%         28 ±  8%  sched_debug.cpu#17.cpu_load[1]
     10608 ±  2%     -17.2%       8783 ±  4%  numa-vmstat.node0.nr_active_file
     42435 ±  2%     -17.2%      35136 ±  4%  numa-meminfo.node0.Active(file)
     63836 ±  0%     -16.9%      53045 ±  0%  numa-vmstat.node1.numa_interleave
     53212 ±  0%     -16.3%      44533 ±  0%  numa-vmstat.node0.numa_interleave
     84165 ±  0%     -16.2%      70563 ±  0%  meminfo.Active(file)
     21040 ±  0%     -16.2%      17640 ±  0%  proc-vmstat.nr_active_file
      6709 ±  0%     +18.4%       7944 ±  3%  sched_debug.cfs_rq[20]:/.tg_load_avg
      6711 ±  0%     +18.5%       7950 ±  3%  sched_debug.cfs_rq[21]:/.tg_load_avg
     35768 ±  9%     -15.0%      30418 ±  8%  sched_debug.cpu#8.nr_load_updates
      6714 ±  0%     +18.4%       7946 ±  3%  sched_debug.cfs_rq[22]:/.tg_load_avg
      6717 ±  0%     +18.0%       7924 ±  3%  sched_debug.cfs_rq[18]:/.tg_load_avg
      6712 ±  0%     +17.9%       7910 ±  3%  sched_debug.cfs_rq[19]:/.tg_load_avg
      6688 ±  1%     +17.9%       7883 ±  2%  sched_debug.cfs_rq[23]:/.tg_load_avg
        33 ±  5%     -16.5%         27 ±  2%  sched_debug.cpu#21.cpu_load[0]
      6893 ±  0%     +17.4%       8092 ±  3%  sched_debug.cfs_rq[7]:/.tg_load_avg
      6988 ±  1%     +15.6%       8078 ±  4%  sched_debug.cfs_rq[0]:/.tg_load_avg
      6577 ±  1%     +18.0%       7760 ±  3%  sched_debug.cfs_rq[30]:/.tg_load_avg
      6982 ±  1%     +16.1%       8105 ±  3%  sched_debug.cfs_rq[3]:/.tg_load_avg
      6875 ±  0%     +17.6%       8085 ±  3%  sched_debug.cfs_rq[8]:/.tg_load_avg
      6579 ±  1%     +17.8%       7748 ±  3%  sched_debug.cfs_rq[29]:/.tg_load_avg
      7016 ±  1%     +15.2%       8083 ±  4%  sched_debug.cfs_rq[1]:/.tg_load_avg
      6873 ±  0%     +17.0%       8042 ±  3%  sched_debug.cfs_rq[9]:/.tg_load_avg
      7005 ±  1%     +15.4%       8084 ±  3%  sched_debug.cfs_rq[2]:/.tg_load_avg
        34 ±  5%     -13.9%         29 ±  6%  sched_debug.cpu#20.cpu_load[0]
      6737 ±  1%     +17.6%       7922 ±  3%  sched_debug.cfs_rq[17]:/.tg_load_avg
      6742 ±  1%     +17.4%       7912 ±  3%  sched_debug.cfs_rq[16]:/.tg_load_avg
      6575 ±  1%     +17.4%       7720 ±  3%  sched_debug.cfs_rq[31]:/.tg_load_avg
      8.09 ±  1%     -13.8%       6.97 ±  0%  perf-profile.cpu-cycles.munmap
      8.08 ±  1%     -13.7%       6.97 ±  0%  perf-profile.cpu-cycles.system_call_fastpath.munmap
        27 ±  6%      -9.0%         25 ±  4%  sched_debug.cfs_rq[23]:/.runnable_load_avg
      8.07 ±  1%     -13.8%       6.96 ±  0%  perf-profile.cpu-cycles.do_munmap.vm_munmap.sys_munmap.system_call_fastpath.munmap
      8.07 ±  1%     -13.8%       6.95 ±  0%  perf-profile.cpu-cycles.unmap_region.do_munmap.vm_munmap.sys_munmap.system_call_fastpath
      8.08 ±  1%     -13.8%       6.97 ±  0%  perf-profile.cpu-cycles.vm_munmap.sys_munmap.system_call_fastpath.munmap
      8.08 ±  1%     -13.8%       6.97 ±  0%  perf-profile.cpu-cycles.sys_munmap.system_call_fastpath.munmap
      6939 ±  1%     +16.4%       8080 ±  3%  sched_debug.cfs_rq[6]:/.tg_load_avg
      6710 ±  1%     +16.4%       7812 ±  3%  sched_debug.cfs_rq[24]:/.tg_load_avg
      6653 ±  1%     +17.0%       7783 ±  3%  sched_debug.cfs_rq[26]:/.tg_load_avg
    622401 ±  4%     +15.2%     717037 ± 11%  sched_debug.cfs_rq[1]:/.min_vruntime
      1504 ±  1%     -13.6%       1300 ±  7%  slabinfo.sock_inode_cache.active_objs
        30 ±  8%     -15.4%         26 ±  5%  sched_debug.cpu#23.load
      1504 ±  1%     -13.6%       1300 ±  7%  slabinfo.sock_inode_cache.num_objs
        30 ±  8%     -15.4%         26 ±  5%  sched_debug.cfs_rq[23]:/.load
      7.46 ±  0%     -13.3%       6.47 ±  0%  perf-profile.cpu-cycles.unmap_vmas.unmap_region.do_munmap.vm_munmap.sys_munmap
      7.46 ±  0%     -13.3%       6.47 ±  0%  perf-profile.cpu-cycles.unmap_single_vma.unmap_vmas.unmap_region.do_munmap.vm_munmap
      5.11 ±  1%     +15.5%       5.90 ±  0%  perf-profile.cpu-cycles.__list_del_entry.list_del.__rmqueue.get_page_from_freelist.__alloc_pages_nodemask
      6887 ±  0%     +16.0%       7986 ±  3%  sched_debug.cfs_rq[10]:/.tg_load_avg
      6645 ±  2%     +17.1%       7783 ±  3%  sched_debug.cfs_rq[25]:/.tg_load_avg
      7.40 ±  0%     -13.4%       6.41 ±  0%  perf-profile.cpu-cycles.unmap_page_range.unmap_single_vma.unmap_vmas.unmap_region.do_munmap
      5.16 ±  1%     +15.7%       5.96 ±  0%  perf-profile.cpu-cycles.list_del.__rmqueue.get_page_from_freelist.__alloc_pages_nodemask.alloc_pages_vma
    687523 ±  2%     +11.9%     769461 ±  4%  sched_debug.cfs_rq[0]:/.min_vruntime
      6834 ±  0%     +16.8%       7979 ±  3%  sched_debug.cfs_rq[14]:/.tg_load_avg
      6885 ±  0%     +16.1%       7996 ±  3%  sched_debug.cfs_rq[12]:/.tg_load_avg
      6894 ±  0%     +16.1%       8005 ±  3%  sched_debug.cfs_rq[11]:/.tg_load_avg
      6803 ±  1%     +16.2%       7901 ±  3%  sched_debug.cfs_rq[15]:/.tg_load_avg
      6963 ±  1%     +16.1%       8087 ±  3%  sched_debug.cfs_rq[5]:/.tg_load_avg
      6841 ±  0%     +16.8%       7991 ±  3%  sched_debug.cfs_rq[13]:/.tg_load_avg
      5.64 ±  1%     +14.8%       6.48 ±  0%  perf-profile.cpu-cycles.__rmqueue.get_page_from_freelist.__alloc_pages_nodemask.alloc_pages_vma.do_cow_fault
       403 ±  7%     +13.6%        458 ±  6%  sched_debug.cfs_rq[1]:/.tg_runnable_contrib
      6967 ±  1%     +15.9%       8078 ±  3%  sched_debug.cfs_rq[4]:/.tg_load_avg
     18553 ±  7%     +13.6%      21084 ±  6%  sched_debug.cfs_rq[1]:/.avg->runnable_avg_sum
      6645 ±  1%     +16.7%       7755 ±  3%  sched_debug.cfs_rq[27]:/.tg_load_avg
      8.77 ±  0%     +14.1%      10.00 ±  0%  perf-profile.cpu-cycles.get_page_from_freelist.__alloc_pages_nodemask.alloc_pages_vma.do_cow_fault.handle_mm_fault
     37777 ± 12%     +20.6%      45541 ±  2%  sched_debug.cfs_rq[24]:/.exec_clock
     67160 ±  8%     -12.5%      58785 ±  8%  sched_debug.cfs_rq[18]:/.exec_clock
      6641 ±  2%     +16.6%       7742 ±  3%  sched_debug.cfs_rq[28]:/.tg_load_avg
        35 ±  9%     -17.0%         29 ± 10%  sched_debug.cpu#17.cpu_load[2]
        34 ±  9%     -13.7%         30 ±  9%  sched_debug.cpu#17.cpu_load[3]
      9.53 ±  0%     +12.7%      10.74 ±  0%  perf-profile.cpu-cycles.__alloc_pages_nodemask.alloc_pages_vma.do_cow_fault.handle_mm_fault.__do_page_fault
     10.08 ±  0%     +12.5%      11.34 ±  0%  perf-profile.cpu-cycles.alloc_pages_vma.do_cow_fault.handle_mm_fault.__do_page_fault.do_page_fault
     41728 ±  2%     -15.1%      35425 ±  4%  numa-meminfo.node1.Active(file)
     10431 ±  2%     -15.1%       8856 ±  4%  numa-vmstat.node1.nr_active_file
     19883 ±  0%     -10.0%      17893 ±  1%  slabinfo.radix_tree_node.num_objs
      7.52 ±  1%     +11.3%       8.37 ±  1%  perf-profile.cpu-cycles._raw_spin_lock.do_cow_fault.handle_mm_fault.__do_page_fault.do_page_fault
     14873 ±  5%     -11.0%      13243 ±  6%  sched_debug.cpu#14.nr_switches
        56 ±  3%      -7.1%         52 ±  6%  sched_debug.cpu#16.cpu_load[2]
     19817 ±  0%      -9.9%      17856 ±  0%  slabinfo.radix_tree_node.active_objs
     49459 ± 10%     +14.7%      56743 ±  2%  sched_debug.cpu#25.nr_load_updates
    741856 ± 10%     +16.5%     864387 ±  2%  sched_debug.cfs_rq[24]:/.min_vruntime
     31.79 ±  0%      -9.3%      28.84 ±  0%  perf-profile.cpu-cycles.do_cow_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
     47.90 ±  1%     +16.9%      55.99 ±  2%  time.user_time
    238256 ±  0%      +8.4%     258184 ±  0%  time.voluntary_context_switches
 2.015e+08 ±  0%      +8.4%  2.184e+08 ±  0%  time.minor_page_faults
       476 ±  0%      +5.9%        504 ±  0%  time.percent_of_cpu_this_job_got
      1441 ±  0%      +5.5%       1520 ±  0%  time.system_time
     40.26 ±  0%      +2.0%      41.04 ±  0%  turbostat.%c0

lkp-snb01: Sandy Bridge-EP
Memory: 32G




                                time.minor_page_faults

  2.5e+08 ++----------------------------------------------------------------+
          |                                                                 |
          O O O O O O   O  O O O O O O O O O O O O O O O O  O O O O O O O O O
    2e+08 *+*.*.*.*.*.*.*..*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*..*.*.*.*.*.*.*.* |
          |                                                                 |
          |                                                                 |
  1.5e+08 ++                                                                |
          |                                                                 |
    1e+08 ++                                                                |
          |                                                                 |
          |                                                                 |
    5e+07 ++                                                                |
          |                                                                 |
          |                                                                 |
        0 ++----------O-----------------------------------------------------+

	[*] bisect-good sample
	[O] bisect-bad  sample

To reproduce:

	apt-get install ruby ruby-oj
	git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
	cd lkp-tests
	bin/setup-local job.yaml # the job file attached in this email
	bin/run-local   job.yaml


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Huang, Ying


View attachment "job.yaml" of type "text/plain" (1549 bytes)

View attachment "reproduce" of type "text/plain" (2399 bytes)

_______________________________________________
LKP mailing list
LKP@...ux.intel.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ