lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180109071628.GA24741@js1304-P5Q-DELUXE>
Date:   Tue, 9 Jan 2018 16:16:28 +0900
From:   Joonsoo Kim <iamjoonsoo.kim@....com>
To:     Ye Xiaolong <xiaolong.ye@...el.com>
Cc:     Stephen Rothwell <sfr@...b.auug.org.au>,
        "Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
        Tony Lindgren <tony@...mide.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Johannes Weiner <hannes@...xchg.org>,
        Laura Abbott <lauraa@...eaurora.org>,
        Marek Szyprowski <m.szyprowski@...sung.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Michal Hocko <mhocko@...e.com>,
        Michal Nazarewicz <mina86@...a86.com>,
        Minchan Kim <minchan@...nel.org>,
        Rik van Riel <riel@...hat.com>,
        Russell King <linux@...linux.org.uk>,
        Will Deacon <will.deacon@....com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...org
Subject: Re: [lkp-robot] [mm/cma]  2b0f904a5a:  fio.read_bw_MBps -16.1%
 regression

On Sat, Jan 06, 2018 at 05:26:31PM +0800, Ye Xiaolong wrote:
> Hi,
> 
> On 01/03, Joonsoo Kim wrote:
> >Hello!
> >
> >On Tue, Jan 02, 2018 at 02:35:28PM +0800, kernel test robot wrote:
> >> 
> >> Greeting,
> >> 
> >> FYI, we noticed a -16.1% regression of fio.read_bw_MBps due to commit:
> >> 
> >> 
> >> commit: 2b0f904a5a8781498417d67226fd12c5e56053ae ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE")
> >> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >> 
> >> in testcase: fio-basic
> >> on test machine: 56 threads Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz with 256G memory
> >> with following parameters:
> >> 
> >> 	disk: 2pmem
> >> 	fs: ext4
> >> 	runtime: 200s
> >> 	nr_task: 50%
> >> 	time_based: tb
> >> 	rw: randread
> >> 	bs: 2M
> >> 	ioengine: mmap
> >> 	test_size: 200G
> >> 	cpufreq_governor: performance
> >> 
> >> test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
> >> test-url: https://github.com/axboe/fio
> >> 
> >> 
> >> 
> >> Details are as below:
> >> -------------------------------------------------------------------------------------------------->
> >> 
> >> 
> >> To reproduce:
> >> 
> >>         git clone https://github.com/intel/lkp-tests.git
> >>         cd lkp-tests
> >>         bin/lkp install job.yaml  # job file is attached in this email
> >>         bin/lkp run     job.yaml
> >> 
> >> =========================================================================================
> >> bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase/time_based:
> >>   2M/gcc-7/performance/2pmem/ext4/mmap/x86_64-rhel-7.2/50%/debian-x86_64-2016-08-31.cgz/200s/randread/lkp-hsw-ep6/200G/fio-basic/tb
> >> 
> >> commit: 
> >>   f6572f9cd2 ("mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE request")
> >>   2b0f904a5a ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE")
> >> 
> >> f6572f9cd248df2c 2b0f904a5a8781498417d67226 
> >> ---------------- -------------------------- 
> >>          %stddev     %change         %stddev
> >>              \          |                \  
> >>      11451           -16.1%       9605        fio.read_bw_MBps
> >>       0.29 ±  5%      +0.1        0.40 ±  3%  fio.latency_1000us%
> >>      19.35 ±  5%      -4.7       14.69 ±  3%  fio.latency_10ms%
> >>       7.92 ±  3%     +12.2       20.15        fio.latency_20ms%
> >>       0.05 ± 11%      +0.0        0.09 ±  8%  fio.latency_2ms%
> >>      70.22            -8.9       61.36        fio.latency_4ms%
> >>       0.29 ± 13%      +0.0        0.33 ±  3%  fio.latency_500us%
> >>       0.45 ± 29%      +1.0        1.45 ±  4%  fio.latency_50ms%
> >>       1.37            +0.1        1.44        fio.latency_750us%
> >>       9792           +31.7%      12896        fio.read_clat_90%_us
> >>      10560           +33.0%      14048        fio.read_clat_95%_us
> >>      15376 ± 10%     +46.9%      22592        fio.read_clat_99%_us
> >>       4885           +19.2%       5825        fio.read_clat_mean_us
> >>       5725           -16.1%       4802        fio.read_iops
> >>  4.598e+09           -16.4%  3.845e+09        fio.time.file_system_inputs
> >>     453153            -8.4%     415215        fio.time.involuntary_context_switches
> >>  5.748e+08           -16.4%  4.806e+08        fio.time.major_page_faults
> >>    1822257           +23.7%    2254706        fio.time.maximum_resident_set_size
> >>       5089            +1.6%       5172        fio.time.system_time
> >>     514.50           -16.3%     430.48        fio.time.user_time
> >
> >System time is increased and user time is decreased. On the below, there is a clue.
> >
> >>      24569 ±  2%      +9.6%      26917 ±  2%  fio.time.voluntary_context_switches
> >>   54443725           -14.9%   46353339        interrupts.CAL:Function_call_interrupts
> >>       0.00 ± 79%      -0.0        0.00 ± 17%  mpstat.cpu.iowait%
> >>       4.45            -0.7        3.71        mpstat.cpu.usr%
> >>    1467516           +21.3%    1779543 ±  3%  meminfo.Active
> >>    1276031           +23.7%    1578443 ±  4%  meminfo.Active(file)
> >>      25789 ±  3%     -76.7%       6013 ±  4%  meminfo.CmaFree
> >>  1.296e+08           -12.6%  1.133e+08        turbostat.IRQ
> >>      41.89            -3.4%      40.47        turbostat.RAMWatt
> >>      17444 ±  2%     -13.5%      15092 ±  3%  turbostat.SMI
> >>   10896428           -16.4%    9111830        vmstat.io.bi
> >>       6010            -6.2%       5637        vmstat.system.cs
> >>     317438           -12.1%     278980        vmstat.system.in
> >>    1072892 ±  3%     +21.5%    1303487        numa-meminfo.node0.Active
> >>     978318           +21.6%    1189809 ±  2%  numa-meminfo.node0.Active(file)
> >>     222968           -25.2%     166818        numa-meminfo.node0.PageTables
> >>      47374 ±  2%     +10.6%      52402 ±  7%  numa-meminfo.node0.SUnreclaim
> >>     165213           +31.9%     217870        numa-meminfo.node1.PageTables
> >>     222405           +10.4%     245633 ±  2%  numa-meminfo.node1.SReclaimable
> >>     102992 ± 46%     -80.8%      19812 ± 38%  numa-meminfo.node1.Shmem
> >>  2.475e+08 ±  2%     -24.0%  1.881e+08        numa-numastat.node0.local_node
> >>   39371795 ± 14%    +167.1%  1.052e+08 ±  2%  numa-numastat.node0.numa_foreign
> >>  2.475e+08 ±  2%     -24.0%  1.881e+08        numa-numastat.node0.numa_hit
> >>   31890417 ± 17%     +40.2%   44705135 ±  8%  numa-numastat.node0.numa_miss
> >>   31899482 ± 17%     +40.2%   44713255 ±  8%  numa-numastat.node0.other_node
> >>  2.566e+08 ±  2%     -44.2%  1.433e+08        numa-numastat.node1.local_node
> >>   31890417 ± 17%     +40.2%   44705135 ±  8%  numa-numastat.node1.numa_foreign
> >>  2.566e+08 ±  2%     -44.2%  1.433e+08        numa-numastat.node1.numa_hit
> >>   39371795 ± 14%    +167.1%  1.052e+08 ±  2%  numa-numastat.node1.numa_miss
> >>   39373660 ± 14%    +167.1%  1.052e+08 ±  2%  numa-numastat.node1.other_node
> >>       6047 ± 39%     -66.5%       2028 ± 63%  sched_debug.cfs_rq:/.exec_clock.min
> >>     461.37 ±  8%     +64.9%     760.74 ± 20%  sched_debug.cfs_rq:/.load_avg.avg
> >>       1105 ± 13%   +1389.3%      16467 ± 56%  sched_debug.cfs_rq:/.load_avg.max
> >>     408.99 ±  3%    +495.0%       2433 ± 49%  sched_debug.cfs_rq:/.load_avg.stddev
> >>      28746 ± 12%     -18.7%      23366 ± 14%  sched_debug.cfs_rq:/.min_vruntime.min
> >>     752426 ±  3%     -12.7%     656636 ±  4%  sched_debug.cpu.avg_idle.avg
> >>     144956 ± 61%     -85.4%      21174 ± 26%  sched_debug.cpu.avg_idle.min
> >>     245684 ± 11%     +44.6%     355257 ±  2%  sched_debug.cpu.avg_idle.stddev
> >>     236035 ± 15%     +51.8%     358264 ± 16%  sched_debug.cpu.nr_switches.max
> >>      42039 ± 22%     +34.7%      56616 ±  8%  sched_debug.cpu.nr_switches.stddev
> >>       3204 ± 24%     -48.1%       1663 ± 30%  sched_debug.cpu.sched_count.min
> >>       2132 ± 25%     +38.7%       2957 ± 11%  sched_debug.cpu.sched_count.stddev
> >>      90.67 ± 32%     -71.8%      25.58 ± 26%  sched_debug.cpu.sched_goidle.min
> >>       6467 ± 15%     +22.3%       7912 ± 15%  sched_debug.cpu.ttwu_count.max
> >>       1513 ± 27%     -55.7%     670.92 ± 22%  sched_debug.cpu.ttwu_count.min
> >>       1025 ± 20%     +68.4%       1727 ±  9%  sched_debug.cpu.ttwu_count.stddev
> >>       1057 ± 16%     -62.9%     391.85 ± 31%  sched_debug.cpu.ttwu_local.min
> >>     244876           +21.6%     297770 ±  2%  numa-vmstat.node0.nr_active_file
> >>      88.00 ±  5%     +19.3%     105.00 ±  5%  numa-vmstat.node0.nr_isolated_file
> >>      55778           -25.1%      41765        numa-vmstat.node0.nr_page_table_pages
> >>      11843 ±  2%     +10.6%      13100 ±  7%  numa-vmstat.node0.nr_slab_unreclaimable
> >>     159.25 ± 42%     -74.9%      40.00 ± 52%  numa-vmstat.node0.nr_vmscan_immediate_reclaim
> >>     244862           +21.6%     297739 ±  2%  numa-vmstat.node0.nr_zone_active_file
> >>   19364320 ± 19%    +187.2%   55617595 ±  2%  numa-vmstat.node0.numa_foreign
> >>     268155 ±  3%     +49.6%     401089 ±  4%  numa-vmstat.node0.workingset_activate
> >>  1.229e+08           -19.0%   99590617        numa-vmstat.node0.workingset_refault
> >>       6345 ±  3%     -76.5%       1489 ±  3%  numa-vmstat.node1.nr_free_cma
> >>      41335           +32.0%      54552        numa-vmstat.node1.nr_page_table_pages
> >>      25770 ± 46%     -80.8%       4956 ± 38%  numa-vmstat.node1.nr_shmem
> >>      55684           +10.4%      61475 ±  2%  numa-vmstat.node1.nr_slab_reclaimable
> >>  1.618e+08 ±  8%     -47.6%   84846798 ± 17%  numa-vmstat.node1.numa_hit
> >>  1.617e+08 ±  8%     -47.6%   84676284 ± 17%  numa-vmstat.node1.numa_local
> >>   19365342 ± 19%    +187.2%   55620100 ±  2%  numa-vmstat.node1.numa_miss
> >>   19534837 ± 19%    +185.6%   55790654 ±  2%  numa-vmstat.node1.numa_other
> >>  1.296e+08           -21.0%  1.024e+08        numa-vmstat.node1.workingset_refault
> >>  1.832e+12            -7.5%  1.694e+12        perf-stat.branch-instructions
> >>       0.25            -0.0        0.23        perf-stat.branch-miss-rate%
> >>  4.666e+09           -16.0%  3.918e+09        perf-stat.branch-misses
> >>      39.88            +1.1       40.98        perf-stat.cache-miss-rate%
> >>  2.812e+10           -11.6%  2.485e+10        perf-stat.cache-misses
> >>  7.051e+10           -14.0%  6.064e+10        perf-stat.cache-references
> >>    1260521            -6.1%    1183071        perf-stat.context-switches
> >>       1.87            +9.6%       2.05        perf-stat.cpi
> >>       6707 ±  2%      -5.2%       6359        perf-stat.cpu-migrations
> >>       1.04 ± 11%      -0.3        0.77 ±  4%  perf-stat.dTLB-load-miss-rate%
> >>  2.365e+10 ±  7%     -25.9%  1.751e+10 ±  9%  perf-stat.dTLB-load-misses
> >>   1.05e+12 ±  4%      -9.5%  9.497e+11 ±  2%  perf-stat.dTLB-stores
> >>      28.16            +2.2       30.35 ±  2%  perf-stat.iTLB-load-miss-rate%
> >>   2.56e+08           -10.4%  2.295e+08        perf-stat.iTLB-loads
> >>  8.974e+12            -9.2%  8.151e+12        perf-stat.instructions
> >>      89411            -8.8%      81529        perf-stat.instructions-per-iTLB-miss
> >>       0.54            -8.8%       0.49        perf-stat.ipc
> >>  5.748e+08           -16.4%  4.806e+08        perf-stat.major-faults
> >>      52.82            +5.8       58.61 ±  2%  perf-stat.node-load-miss-rate%
> >>  7.206e+09 ±  2%     -18.6%  5.867e+09 ±  3%  perf-stat.node-loads
> >>      17.96 ±  8%     +15.7       33.69 ±  2%  perf-stat.node-store-miss-rate%
> >>  2.055e+09 ±  8%     +65.1%  3.393e+09 ±  4%  perf-stat.node-store-misses
> >>  9.391e+09 ±  2%     -28.9%  6.675e+09        perf-stat.node-stores
> >>  5.753e+08           -16.4%  4.811e+08        perf-stat.page-faults
> >>     305865           -16.3%     256108        proc-vmstat.allocstall_movable
> >>       1923 ± 14%     -72.1%     537.00 ± 12%  proc-vmstat.allocstall_normal
> >>       0.00            +Inf%       1577 ± 67%  proc-vmstat.compact_isolated
> >>       1005 ±  4%     -65.8%     344.00 ±  7%  proc-vmstat.kswapd_low_wmark_hit_quickly
> >>     320062           +23.2%     394374 ±  4%  proc-vmstat.nr_active_file
> >>       6411 ±  2%     -76.4%       1511 ±  4%  proc-vmstat.nr_free_cma
> >>     277.00 ± 12%     -51.4%     134.75 ± 52%  proc-vmstat.nr_vmscan_immediate_reclaim
> >>     320049           +23.2%     394353 ±  4%  proc-vmstat.nr_zone_active_file
> >>   71262212 ± 15%    +110.3%  1.499e+08 ±  3%  proc-vmstat.numa_foreign
> >>  5.042e+08 ±  2%     -34.3%  3.314e+08        proc-vmstat.numa_hit
> >>  5.041e+08 ±  2%     -34.3%  3.314e+08        proc-vmstat.numa_local
> >>   71262212 ± 15%    +110.3%  1.499e+08 ±  3%  proc-vmstat.numa_miss
> >>   71273176 ± 15%    +110.3%  1.499e+08 ±  3%  proc-vmstat.numa_other
> >>       1007 ±  4%     -65.6%     346.25 ±  7%  proc-vmstat.pageoutrun
> >>   23070268           -16.0%   19386190        proc-vmstat.pgalloc_dma32
> >>  5.525e+08           -16.7%  4.603e+08        proc-vmstat.pgalloc_normal
> >>  5.753e+08           -16.4%  4.812e+08        proc-vmstat.pgfault
> >>  5.751e+08           -16.3%  4.813e+08        proc-vmstat.pgfree
> >>  5.748e+08           -16.4%  4.806e+08        proc-vmstat.pgmajfault
> >>  2.299e+09           -16.4%  1.923e+09        proc-vmstat.pgpgin
> >>  8.396e+08           -17.8%  6.901e+08        proc-vmstat.pgscan_direct
> >>  3.018e+08 ±  2%     -13.0%  2.627e+08        proc-vmstat.pgscan_kswapd
> >>    4.1e+08           -15.1%   3.48e+08        proc-vmstat.pgsteal_direct
> >>  1.542e+08 ±  3%     -20.9%   1.22e+08 ±  3%  proc-vmstat.pgsteal_kswapd
> >>      23514 ±  4%     -23.1%      18076 ± 16%  proc-vmstat.slabs_scanned
> >>     343040 ±  2%     +40.3%     481253 ±  2%  proc-vmstat.workingset_activate
> >>  2.525e+08           -20.1%  2.018e+08        proc-vmstat.workingset_refault
> >>      13.64 ±  3%      -1.7       11.96 ±  2%  perf-profile.calltrace.cycles-pp.ext4_mpage_readpages.filemap_fault.ext4_filemap_fault.__do_fault.__handle_mm_fault
> >>      11.67 ±  3%      -1.4       10.29 ±  2%  perf-profile.calltrace.cycles-pp.submit_bio.ext4_mpage_readpages.filemap_fault.ext4_filemap_fault.__do_fault
> >>      11.64 ±  3%      -1.4       10.25 ±  2%  perf-profile.calltrace.cycles-pp.generic_make_request.submit_bio.ext4_mpage_readpages.filemap_fault.ext4_filemap_fault
> >>      11.10 ±  3%      -1.3        9.82 ±  2%  perf-profile.calltrace.cycles-pp.pmem_make_request.generic_make_request.submit_bio.ext4_mpage_readpages.filemap_fault
> >>       9.21 ±  3%      -1.2        8.04 ±  3%  perf-profile.calltrace.cycles-pp.pmem_do_bvec.pmem_make_request.generic_make_request.submit_bio.ext4_mpage_readpages
> >>      27.33 ±  4%      -1.0       26.35 ±  5%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64
> >>      27.33 ±  4%      -1.0       26.35 ±  5%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
> >>      27.33 ±  4%      -1.0       26.35 ±  5%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
> >>      27.33 ±  4%      -1.0       26.35 ±  5%  perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64
> >>      26.79 ±  4%      -0.8       25.98 ±  5%  perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
> >>      27.98 ±  3%      -0.8       27.22 ±  4%  perf-profile.calltrace.cycles-pp.secondary_startup_64
> >>       5.36 ± 12%      -0.6        4.76 ±  7%  perf-profile.calltrace.cycles-pp.kswapd.kthread.ret_from_fork
> >>       5.36 ± 12%      -0.6        4.76 ±  7%  perf-profile.calltrace.cycles-pp.shrink_node.kswapd.kthread.ret_from_fork
> >>       5.30 ± 12%      -0.6        4.71 ±  7%  perf-profile.calltrace.cycles-pp.shrink_inactive_list.shrink_node_memcg.shrink_node.kswapd.kthread
> >>       5.35 ± 12%      -0.6        4.76 ±  7%  perf-profile.calltrace.cycles-pp.shrink_node_memcg.shrink_node.kswapd.kthread.ret_from_fork
> >>       5.43 ± 12%      -0.5        4.88 ±  7%  perf-profile.calltrace.cycles-pp.ret_from_fork
> >>       5.43 ± 12%      -0.5        4.88 ±  7%  perf-profile.calltrace.cycles-pp.kthread.ret_from_fork
> >>      11.04 ±  2%      -0.2       10.82 ±  2%  perf-profile.calltrace.cycles-pp.shrink_page_list.shrink_inactive_list.shrink_node_memcg.shrink_node.do_try_to_free_pages
> >>      62.44 ±  2%      +1.9       64.38        perf-profile.calltrace.cycles-pp.page_fault
> >>      62.38 ±  2%      +2.0       64.33        perf-profile.calltrace.cycles-pp.__do_page_fault.do_page_fault.page_fault
> >>      62.38 ±  2%      +2.0       64.34        perf-profile.calltrace.cycles-pp.do_page_fault.page_fault
> >>      61.52 ±  2%      +2.1       63.58        perf-profile.calltrace.cycles-pp.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
> >>      61.34 ±  2%      +2.1       63.44        perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
> >>      30.18 ±  3%      +2.3       32.45 ±  2%  perf-profile.calltrace.cycles-pp.shrink_inactive_list.shrink_node_memcg.shrink_node.do_try_to_free_pages.try_to_free_pages
> >>       7.98 ±  3%      +2.3       10.33 ±  2%  perf-profile.calltrace.cycles-pp.add_to_page_cache_lru.filemap_fault.ext4_filemap_fault.__do_fault.__handle_mm_fault
> >>      30.48 ±  3%      +2.4       32.83 ±  2%  perf-profile.calltrace.cycles-pp.try_to_free_pages.__alloc_pages_slowpath.__alloc_pages_nodemask.filemap_fault.ext4_filemap_fault
> >>      30.46 ±  3%      +2.4       32.81 ±  2%  perf-profile.calltrace.cycles-pp.do_try_to_free_pages.try_to_free_pages.__alloc_pages_slowpath.__alloc_pages_nodemask.filemap_fault
> >>      30.46 ±  3%      +2.4       32.81 ±  2%  perf-profile.calltrace.cycles-pp.shrink_node.do_try_to_free_pages.try_to_free_pages.__alloc_pages_slowpath.__alloc_pages_nodemask
> >>      30.37 ±  3%      +2.4       32.75 ±  2%  perf-profile.calltrace.cycles-pp.shrink_node_memcg.shrink_node.do_try_to_free_pages.try_to_free_pages.__alloc_pages_slowpath
> >>       5.58 ±  4%      +2.5        8.08 ±  2%  perf-profile.calltrace.cycles-pp.__lru_cache_add.add_to_page_cache_lru.filemap_fault.ext4_filemap_fault.__do_fault
> >>      32.88 ±  3%      +2.5       35.38 ±  2%  perf-profile.calltrace.cycles-pp.__alloc_pages_nodemask.filemap_fault.ext4_filemap_fault.__do_fault.__handle_mm_fault
> >>       5.51 ±  4%      +2.5        8.02 ±  2%  perf-profile.calltrace.cycles-pp.pagevec_lru_move_fn.__lru_cache_add.add_to_page_cache_lru.filemap_fault.ext4_filemap_fault
> >>       4.24 ±  4%      +2.5        6.76 ±  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.pagevec_lru_move_fn.__lru_cache_add.add_to_page_cache_lru.filemap_fault
> >>       4.18 ±  4%      +2.5        6.70 ±  2%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.pagevec_lru_move_fn.__lru_cache_add.add_to_page_cache_lru
> >>      18.64 ±  3%      +2.5       21.16 ±  2%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.shrink_inactive_list.shrink_node_memcg.shrink_node
> >>      31.65 ±  3%      +2.7       34.31 ±  2%  perf-profile.calltrace.cycles-pp.__alloc_pages_slowpath.__alloc_pages_nodemask.filemap_fault.ext4_filemap_fault.__do_fault
> >>      17.21 ±  3%      +2.7       19.93 ±  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.shrink_inactive_list.shrink_node_memcg.shrink_node.do_try_to_free_pages
> >
> >It looks like there is more lru lock contention. It would be caused by
> >using a movable zone for the CMA memory by this patch. In this case,
> >reclaim for normal memory skips the lru page on the movable zone so needs
> >more time to find enough reclaim target pages. It would increase lru lock
> >holding time and then cause contention.
> >
> >Could you give me another stat 'pgskip_XXX' in /proc/vmstat to confirm
> >my theory?
> 
> Attached is the /proc/vmstat sample file during the test, sample interval is 1s.

Thanks!

pgskip_XXX is low so my theory would be wrong. The other theory is
that numa miss is the reason of the regression. Could you test the
same test on the system without numa? I cannot test it since I
don't have pmem.

Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ