[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <874m6ryz0u.fsf@yhuang-mobile.sh.intel.com>
Date: Thu, 11 Aug 2016 10:51:45 -0700
From: "Huang\, Ying" <ying.huang@...el.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Christoph Hellwig <hch@....de>,
"Huang\, Ying" <ying.huang@...el.com>,
Dave Chinner <david@...morbit.com>,
LKML <linux-kernel@...r.kernel.org>,
Bob Peterson <rpeterso@...hat.com>,
"Wu Fengguang" <fengguang.wu@...el.com>, LKP <lkp@...org>
Subject: Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression
Linus Torvalds <torvalds@...ux-foundation.org> writes:
> On Thu, Aug 11, 2016 at 8:57 AM, Christoph Hellwig <hch@....de> wrote:
>>
>> The one liner below (not tested yet) to simply remove it should fix that
>> up. I also noticed we have a spurious pagefault_disable/enable, I
>> need to dig into the history of that first, though.
>
> Hopefully the pagefault_disable/enable doesn't matter for this case.
>
> Can we get this one-liner tested with the kernel robot for comparison?
> I really think a messed-up LRU list could cause bad IO patterns, and
> end up keeping dirty pages around that should be streaming out to disk
> and re-used, so causing memory pressure etc for no good reason.
>
> I think the mapping->tree_lock issue that Dave sees is interesting
> too, but the kswapd activity (and the extra locking it causes) could
> also be a symptom of the same thing - memory pressure due to just
> putting pages in the active file that simply shouldn't be there.
>
> So the trivial oneliner _might_ just explain things. It would be
> really nice if the regression turns out to be due to something so
> easily fixed.
>
Here is the test result for the debug patch. It appears that the aim7
score is a little better, but the regression is not recovered.
commit 5c70fdfdf82723e47ac180e36a7638ca06ea19d8
Author: Christoph Hellwig <hch@....de>
Date: Thu Aug 11 17:57:21 2016 +0200
dbg fix 68a9f5e700: aim7.jobs-per-min -13.6% regression
On Wed, Aug 10, 2016 at 06:00:24PM -0700, Linus Torvalds wrote:
> The biggest difference is that we have "mark_page_accessed()" show up
> after, and not before. There was also a lot of LRU noise in the
> non-profile data. I wonder if that is the reason here: the old model
> of using generic_perform_write/block_page_mkwrite didn't mark the
> pages accessed, and now with iomap_file_buffered_write() they get
> marked as active and that screws up the LRU list, and makes us not
> flush out the dirty pages well (because they are seen as active and
> not good for writeback), and then you get bad memory use.
And that's actually a "bug" in the new code - mostly because I failed
to pick up changes to the core code happening after we 'forked' it,
in this case commit 2457ae ("mm: non-atomically mark page accessed during page
cache allocation where possible").
The one liner below (not tested yet) to simply remove it should fix that
up. I also noticed we have a spurious pagefault_disable/enable, I
need to dig into the history of that first, though.
diff --git a/fs/iomap.c b/fs/iomap.c
index 48141b8..f39c318 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -199,7 +199,6 @@ again:
pagefault_enable();
flush_dcache_page(page);
- mark_page_accessed(page);
status = iomap_write_end(inode, pos, bytes, copied, page);
if (unlikely(status < 0))
=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/load/rootfs/tbox_group/test/testcase:
gcc-6/performance/1BRD_48G/xfs/x86_64-rhel/3000/debian-x86_64-2015-02-07.cgz/ivb44/disk_wrt/aim7
commit:
f0c6bcba74ac51cb77aadb33ad35cb2dc1ad1506
68a9f5e7007c1afa2cf6830b690a90d0187c0684
5c70fdfdf82723e47ac180e36a7638ca06ea19d8
f0c6bcba74ac51cb 68a9f5e7007c1afa2cf6830b69 5c70fdfdf82723e47ac180e36a
---------------- -------------------------- --------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
486586 ± 0% -13.6% 420523 ± 0% -11.2% 432165 ± 0% aim7.jobs-per-min
37.23 ± 0% +15.5% 43.02 ± 0% +12.4% 41.87 ± 0% aim7.time.elapsed_time
37.23 ± 0% +15.5% 43.02 ± 0% +12.4% 41.87 ± 0% aim7.time.elapsed_time.max
6424 ± 1% +31.3% 8437 ± 1% +24.4% 7992 ± 3% aim7.time.involuntary_context_switches
2489 ± 3% -2.5% 2425 ± 1% -1.0% 2465 ± 3% aim7.time.maximum_resident_set_size
151288 ± 0% +3.0% 155764 ± 0% +2.0% 154252 ± 1% aim7.time.minor_page_faults
376.31 ± 0% +28.4% 483.23 ± 0% +22.5% 460.96 ± 1% aim7.time.system_time
429058 ± 0% -20.1% 343013 ± 0% -16.7% 357231 ± 0% aim7.time.voluntary_context_switches
16014 ± 0% +28.3% 20545 ± 1% -4.0% 15369 ± 0% meminfo.Active(file)
127154 ± 9% -15.5% 107424 ± 10% -4.4% 121505 ± 11% softirqs.SCHED
24561 ± 0% -27.1% 17895 ± 1% -22.5% 19033 ± 1% vmstat.system.cs
47289 ± 0% +1.4% 47938 ± 0% +1.1% 47807 ± 0% vmstat.system.in
4003 ± 0% +27.8% 5117 ± 1% -4.0% 3842 ± 0% proc-vmstat.nr_active_file
979.25 ± 0% +59.1% 1558 ± 5% -11.8% 864.00 ± 1% proc-vmstat.pgactivate
4699 ± 3% +132.6% 10932 ± 78% +77.1% 8321 ± 36% proc-vmstat.pgpgout
7868 ± 1% +28.9% 10140 ± 6% -1.8% 7724 ± 1% numa-meminfo.node0.Active(file)
161402 ± 1% -21.0% 127504 ± 25% -1.7% 158708 ± 5% numa-meminfo.node0.Slab
8148 ± 1% +27.8% 10416 ± 7% -6.1% 7648 ± 1% numa-meminfo.node1.Active(file)
81041 ± 3% +43.8% 116540 ± 27% +5.7% 85631 ± 8% numa-meminfo.node1.Slab
13903 ± 18% +22.8% 17076 ± 9% -27.8% 10039 ± 12% numa-numastat.node0.numa_foreign
12525 ± 20% +27.1% 15922 ± 9% -26.8% 9170 ± 17% numa-numastat.node0.numa_miss
14084 ± 18% -35.4% 9102 ± 17% +16.5% 16401 ± 8% numa-numastat.node1.numa_foreign
15461 ± 17% -33.7% 10257 ± 14% +11.7% 17270 ± 9% numa-numastat.node1.numa_miss
1966 ± 1% +30.4% 2565 ± 4% -1.8% 1930 ± 1% numa-vmstat.node0.nr_active_file
4204 ± 3% +17.7% 4947 ± 7% +6.1% 4461 ± 6% numa-vmstat.node0.nr_alloc_batch
521.75 ± 4% +5.4% 549.80 ± 4% -12.0% 459.00 ± 12% numa-vmstat.node0.nr_dirty
2037 ± 1% +24.9% 2543 ± 5% -6.1% 1912 ± 1% numa-vmstat.node1.nr_active_file
24.26 ± 0% +8.7% 26.36 ± 0% +7.1% 25.99 ± 0% turbostat.%Busy
686.75 ± 0% +9.2% 749.80 ± 0% +6.5% 731.67 ± 0% turbostat.Avg_MHz
0.29 ± 1% -24.9% 0.22 ± 2% -18.8% 0.23 ± 2% turbostat.CPU%c3
91.39 ± 2% +3.5% 94.60 ± 0% +4.4% 95.45 ± 1% turbostat.CorWatt
121.88 ± 1% +2.6% 124.99 ± 0% +3.9% 126.57 ± 1% turbostat.PkgWatt
37.23 ± 0% +15.5% 43.02 ± 0% +12.4% 41.87 ± 0% time.elapsed_time
37.23 ± 0% +15.5% 43.02 ± 0% +12.4% 41.87 ± 0% time.elapsed_time.max
6424 ± 1% +31.3% 8437 ± 1% +24.4% 7992 ± 3% time.involuntary_context_switches
1038 ± 0% +10.5% 1148 ± 0% +8.4% 1126 ± 0% time.percent_of_cpu_this_job_got
376.31 ± 0% +28.4% 483.23 ± 0% +22.5% 460.96 ± 1% time.system_time
429058 ± 0% -20.1% 343013 ± 0% -16.7% 357231 ± 0% time.voluntary_context_switches
53643508 ± 0% -19.5% 43181295 ± 2% -16.4% 44822518 ± 0% cpuidle.C1-IVT.time
318952 ± 0% -25.7% 236993 ± 0% -21.3% 251084 ± 0% cpuidle.C1-IVT.usage
3471235 ± 2% -18.1% 2843694 ± 3% -12.9% 3022770 ± 1% cpuidle.C1E-IVT.time
46642 ± 1% -22.5% 36144 ± 0% -17.4% 38545 ± 0% cpuidle.C1E-IVT.usage
12601665 ± 1% -21.9% 9837996 ± 1% -18.9% 10218444 ± 0% cpuidle.C3-IVT.time
79872 ± 1% -19.7% 64163 ± 0% -16.8% 66477 ± 0% cpuidle.C3-IVT.usage
1.292e+09 ± 0% +13.7% 1.469e+09 ± 0% +10.9% 1.434e+09 ± 0% cpuidle.C6-IVT.time
5131 ±121% -98.5% 75.60 ±200% -93.3% 344.33 ± 98% latency_stats.avg.nfs_wait_on_request.nfs_updatepage.nfs_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
2966 ± 80% -27.5% 2151 ± 16% +359.2% 13624 ± 77% latency_stats.max.down.xfs_buf_lock._xfs_buf_find.xfs_buf_get_map.xfs_buf_read_map.xfs_trans_read_buf_map.xfs_read_agi.xfs_iunlink.xfs_droplink.xfs_remove.xfs_vn_unlink.vfs_unlink
5131 ±121% -98.5% 75.60 ±200% -93.3% 344.33 ± 98% latency_stats.max.nfs_wait_on_request.nfs_updatepage.nfs_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
9739 ± 99% -99.0% 95.40 ± 9% -99.1% 91.33 ± 5% latency_stats.max.submit_bio_wait.blkdev_issue_flush.ext4_sync_fs.sync_fs_one_sb.iterate_supers.sys_sync.entry_SYSCALL_64_fastpath
7739 ± 81% -71.5% 2202 ± 46% -77.4% 1752 ± 23% latency_stats.max.wait_on_page_bit.__filemap_fdatawait_range.filemap_fdatawait_keep_errors.sync_inodes_sb.sync_inodes_one_sb.iterate_supers.sys_sync.entry_SYSCALL_64_fastpath
5131 ±121% -98.5% 75.60 ±200% -93.3% 344.33 ± 98% latency_stats.sum.nfs_wait_on_request.nfs_updatepage.nfs_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
10459 ± 97% -97.5% 263.80 ± 5% -97.6% 254.67 ± 1% latency_stats.sum.submit_bio_wait.blkdev_issue_flush.ext4_sync_fs.sync_fs_one_sb.iterate_supers.sys_sync.entry_SYSCALL_64_fastpath
9097 ± 81% -70.2% 2708 ± 40% -74.3% 2335 ± 21% latency_stats.sum.wait_on_page_bit.__filemap_fdatawait_range.filemap_fdatawait_keep_errors.sync_inodes_sb.sync_inodes_one_sb.iterate_supers.sys_sync.entry_SYSCALL_64_fastpath
229.50 ± 11% -6.7% 214.20 ± 46% -77.8% 51.00 ± 0% slabinfo.dio.active_objs
229.50 ± 11% -6.7% 214.20 ± 46% -77.8% 51.00 ± 0% slabinfo.dio.num_objs
369.75 ± 17% -6.2% 346.80 ± 11% -21.8% 289.00 ± 16% slabinfo.kmem_cache.active_objs
369.75 ± 17% -6.2% 346.80 ± 11% -21.8% 289.00 ± 16% slabinfo.kmem_cache.num_objs
1022 ± 5% -2.9% 992.80 ± 11% -11.9% 900.33 ± 3% slabinfo.nsproxy.active_objs
1022 ± 5% -2.9% 992.80 ± 11% -11.9% 900.33 ± 3% slabinfo.nsproxy.num_objs
1836 ± 10% +60.6% 2949 ± 9% +45.1% 2665 ± 6% slabinfo.scsi_data_buffer.active_objs
1836 ± 10% +60.6% 2949 ± 9% +45.1% 2665 ± 6% slabinfo.scsi_data_buffer.num_objs
431.75 ± 10% +60.7% 693.80 ± 9% +45.1% 626.67 ± 6% slabinfo.xfs_efd_item.active_objs
431.75 ± 10% +60.7% 693.80 ± 9% +45.1% 626.67 ± 6% slabinfo.xfs_efd_item.num_objs
2.59e+11 ± 6% +22.4% 3.169e+11 ± 5% +22.9% 3.182e+11 ± 7% perf-stat.branch-instructions
0.41 ± 2% -9.2% 0.38 ± 1% -25.5% 0.31 ± 3% perf-stat.branch-miss-rate
1.072e+09 ± 4% +11.3% 1.193e+09 ± 3% -8.5% 9.812e+08 ± 4% perf-stat.branch-misses
972882 ± 0% -17.4% 803235 ± 0% -14.3% 833753 ± 0% perf-stat.context-switches
1.472e+12 ± 6% +20.5% 1.774e+12 ± 5% +16.7% 1.717e+12 ± 7% perf-stat.cpu-cycles
100350 ± 1% -5.2% 95091 ± 1% -3.8% 96553 ± 0% perf-stat.cpu-migrations
7.315e+08 ± 24% +54.9% 1.133e+09 ± 35% +107.1% 1.515e+09 ± 64% perf-stat.dTLB-load-misses
3.225e+11 ± 5% +38.1% 4.454e+11 ± 3% +32.2% 4.263e+11 ± 6% perf-stat.dTLB-loads
2.176e+11 ± 9% +45.2% 3.16e+11 ± 5% +50.1% 3.267e+11 ± 7% perf-stat.dTLB-stores
1.452e+12 ± 6% +27.7% 1.853e+12 ± 5% +28.0% 1.858e+12 ± 7% perf-stat.instructions
42168 ± 16% +24.9% 52676 ± 7% +33.4% 56234 ± 6% perf-stat.instructions-per-iTLB-miss
0.99 ± 0% +5.9% 1.05 ± 0% +9.6% 1.08 ± 0% perf-stat.ipc
252401 ± 0% +6.7% 269219 ± 0% +5.2% 265524 ± 0% perf-stat.minor-faults
10.16 ± 3% +14.5% 11.63 ± 4% +12.3% 11.41 ± 3% perf-stat.node-store-miss-rate
24842185 ± 2% +11.0% 27573897 ± 2% +7.6% 26720271 ± 3% perf-stat.node-store-misses
2.198e+08 ± 2% -4.5% 2.1e+08 ± 5% -5.6% 2.076e+08 ± 2% perf-stat.node-stores
252321 ± 0% +6.6% 269092 ± 0% +5.2% 265415 ± 0% perf-stat.page-faults
4.08 ± 64% +63.1% 6.65 ± 77% +609.5% 28.93 ± 92% sched_debug.cfs_rq:/.MIN_vruntime.avg
157.17 ± 61% +30.0% 204.25 ± 47% +686.9% 1236 ±112% sched_debug.cfs_rq:/.MIN_vruntime.max
4.08 ± 64% +63.1% 6.65 ± 77% +609.5% 28.93 ± 92% sched_debug.cfs_rq:/.max_vruntime.avg
157.17 ± 61% +30.0% 204.25 ± 47% +686.9% 1236 ±112% sched_debug.cfs_rq:/.max_vruntime.max
191.00 ± 35% -23.4% 146.40 ± 15% -28.4% 136.67 ± 5% sched_debug.cfs_rq:/.runnable_load_avg.max
44.43 ± 16% -20.2% 35.47 ± 9% -25.2% 33.25 ± 1% sched_debug.cfs_rq:/.runnable_load_avg.stddev
50.23 ± 19% -27.1% 36.61 ± 15% -30.3% 35.03 ± 15% sched_debug.cpu.cpu_load[1].avg
466.50 ± 29% -55.6% 207.20 ± 73% -47.5% 245.00 ± 67% sched_debug.cpu.cpu_load[1].max
77.78 ± 33% -52.4% 37.05 ± 53% -48.8% 39.86 ± 51% sched_debug.cpu.cpu_load[1].stddev
38.82 ± 19% -24.2% 29.42 ± 15% -29.6% 27.31 ± 10% sched_debug.cpu.cpu_load[2].avg
300.50 ± 33% -55.1% 135.00 ± 46% -50.2% 149.67 ± 46% sched_debug.cpu.cpu_load[2].max
51.71 ± 41% -49.3% 26.20 ± 46% -50.9% 25.39 ± 33% sched_debug.cpu.cpu_load[2].stddev
27.08 ± 19% -22.4% 21.02 ± 15% -27.8% 19.56 ± 11% sched_debug.cpu.cpu_load[3].avg
185.00 ± 43% -49.6% 93.20 ± 44% -48.3% 95.67 ± 30% sched_debug.cpu.cpu_load[3].max
32.91 ± 50% -46.1% 17.75 ± 45% -49.8% 16.51 ± 25% sched_debug.cpu.cpu_load[3].stddev
17.78 ± 20% -20.9% 14.06 ± 13% -25.3% 13.28 ± 13% sched_debug.cpu.cpu_load[4].avg
20.29 ± 55% -44.5% 11.26 ± 39% -47.7% 10.61 ± 27% sched_debug.cpu.cpu_load[4].stddev
4929 ± 18% -24.8% 3704 ± 23% -4.5% 4708 ± 21% sched_debug.cpu.nr_load_updates.avg
276.50 ± 10% -4.4% 264.20 ± 7% -14.3% 237.00 ± 19% sched_debug.cpu.nr_switches.min
Best Regards,
Huang, Ying
Powered by blists - more mailing lists