[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8737wafge7.fsf@yhuang-dev.intel.com>
Date: Fri, 13 Nov 2015 16:33:04 +0800
From: "Huang\, Ying" <ying.huang@...ux.intel.com>
To: Hugh Dickins <hughd@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Josef Bacik <jbacik@...com>, Yu Zhao <yuzhao@...gle.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH] tmpfs: avoid a little creat and stat slowdown
Hugh Dickins <hughd@...gle.com> writes:
> On Wed, 4 Nov 2015, Huang, Ying wrote:
>> Hugh Dickins <hughd@...gle.com> writes:
>>
>> > LKP reports that v4.2 commit afa2db2fb6f1 ("tmpfs: truncate prealloc
>> > blocks past i_size") causes a 14.5% slowdown in the AIM9 creat-clo
>> > benchmark.
>> >
>> > creat-clo does just what you'd expect from the name, and creat's O_TRUNC
>> > on 0-length file does indeed get into more overhead now shmem_setattr()
>> > tests "0 <= 0" instead of "0 < 0".
>> >
>> > I'm not sure how much we care, but I think it would not be too VW-like
>> > to add in a check for whether any pages (or swap) are allocated: if none
>> > are allocated, there's none to remove from the radix_tree. At first I
>> > thought that check would be good enough for the unmaps too, but no: we
>> > should not skip the unlikely case of unmapping pages beyond the new EOF,
>> > which were COWed from holes which have now been reclaimed, leaving none.
>> >
>> > This gives me an 8.5% speedup: on Haswell instead of LKP's Westmere,
>> > and running a debug config before and after: I hope those account for
>> > the lesser speedup.
>> >
>> > And probably someone has a benchmark where a thousand threads keep on
>> > stat'ing the same file repeatedly: forestall that report by adjusting
>> > v4.3 commit 44a30220bc0a ("shmem: recalculate file inode when fstat")
>> > not to take the spinlock in shmem_getattr() when there's no work to do.
>> >
>> > Reported-by: Ying Huang <ying.huang@...ux.intel.com>
>> > Signed-off-by: Hugh Dickins <hughd@...gle.com>
>>
>> Hi, Hugh,
>>
>> Thanks a lot for your support! The test on LKP shows that this patch
>> restores a big part of the regression! In following list,
>>
>> c435a390574d012f8d30074135d8fcc6f480b484: is parent commit
>> afa2db2fb6f15f860069de94a1257db57589fe95: is the first bad commit has
>> performance regression.
>> 43819159da2b77fedcf7562134d6003dccd6a068: is the fixing patch
>
> Hi Ying,
>
> Thank you, for reporting, and for trying out the patch (which is now
> in Linus's tree as commit d0424c429f8e0555a337d71e0a13f2289c636ec9).
>
> But I'm disappointed by the result: do I understand correctly,
> that afa2db2fb6f1 made a -12.5% change, but the fix still -5.6%
> from your parent comparison point?
Yes.
> If we value that microbenchmark
> at all (debatable), I'd say that's not good enough.
I think that is a good improvement.
> It does match with my own rough measurement, but I'd been hoping
> for better when done in a more controlled environment; and I cannot
> explain why "truncate prealloc blocks past i_size" creat-clo performance
> would not be fully corrected by "avoid a little creat and stat slowdown"
> (unless either patch adds subtle icache or dcache displacements).
>
> I'm not certain of how you performed the comparison. Was the
> c435a390574d tree measured, then patch afa2db2fb6f1 applied on top
> of that and measured, then patch 43819159da2b applied on top of that
> and measured? Or were there other intervening changes, which could
> easily add their own interference?
c435a390574d is the direct parent of afa2db2fb6f1 in its original git.
43819159da2b is your patch applied on top of v4.3-rc7. The comparison
of 43819159da2b with v4.3-rc7 is as follow:
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
gcc-4.9/performance/x86_64-rhel/debian-x86_64-2015-02-07.cgz/lkp-wsx02/creat-clo/aim9/300s
commit:
32b88194f71d6ae7768a29f87fbba454728273ee
43819159da2b77fedcf7562134d6003dccd6a068
32b88194f71d6ae7 43819159da2b77fedcf7562134
---------------- --------------------------
%stddev %change %stddev
\ | \
475224 ± 1% +11.9% 531968 ± 1% aim9.creat-clo.ops_per_sec
10469094 ±201% -52.3% 4998529 ±130% latency_stats.avg.nfs_wait_on_request.nfs_updatepage.nfs_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
18852332 ±223% -73.5% 4998529 ±130% latency_stats.max.nfs_wait_on_request.nfs_updatepage.nfs_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
21758590 ±199% -77.0% 4998529 ±130% latency_stats.sum.nfs_wait_on_request.nfs_updatepage.nfs_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.nfs_file_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath
4817724 ± 0% +9.6% 5280303 ± 1% proc-vmstat.numa_hit
4812582 ± 0% +9.7% 5280287 ± 1% proc-vmstat.numa_local
8499767 ± 4% +14.2% 9707953 ± 4% proc-vmstat.pgalloc_normal
8984075 ± 0% +10.4% 9919044 ± 1% proc-vmstat.pgfree
9.22 ± 8% +27.4% 11.75 ± 9% sched_debug.cfs_rq[0]:/.nr_spread_over
2667 ± 63% +90.0% 5068 ± 37% sched_debug.cfs_rq[20]:/.min_vruntime
152513 ±272% -98.5% 2306 ± 48% sched_debug.cfs_rq[21]:/.min_vruntime
477.36 ± 60% +128.6% 1091 ± 60% sched_debug.cfs_rq[27]:/.exec_clock
4.00 ±112% +418.8% 20.75 ± 67% sched_debug.cfs_rq[28]:/.util_avg
1212 ± 80% +195.0% 3577 ± 48% sched_debug.cfs_rq[29]:/.exec_clock
8119 ± 53% -60.4% 3217 ± 26% sched_debug.cfs_rq[2]:/.min_vruntime
584.80 ± 65% -60.0% 234.06 ± 13% sched_debug.cfs_rq[30]:/.exec_clock
4245 ± 27% -42.8% 2429 ± 24% sched_debug.cfs_rq[30]:/.min_vruntime
0.00 ± 0% +Inf% 2.25 ± 72% sched_debug.cfs_rq[44]:/.util_avg
1967 ± 39% +72.0% 3384 ± 15% sched_debug.cfs_rq[61]:/.min_vruntime
1863 ± 43% +99.2% 3710 ± 33% sched_debug.cfs_rq[72]:/.min_vruntime
0.78 ±336% -678.6% -4.50 ±-33% sched_debug.cpu#12.nr_uninterruptible
10686 ± 49% +77.8% 19002 ± 34% sched_debug.cpu#15.nr_switches
5256 ± 50% +79.0% 9410 ± 34% sched_debug.cpu#15.sched_goidle
-2.00 ±-139% -225.0% 2.50 ± 44% sched_debug.cpu#21.nr_uninterruptible
-1.78 ±-105% -156.2% 1.00 ±141% sched_debug.cpu#23.nr_uninterruptible
45017 ±132% -76.1% 10741 ± 30% sched_debug.cpu#24.nr_load_updates
2216 ± 14% +73.3% 3839 ± 63% sched_debug.cpu#35.nr_switches
2223 ± 14% +73.0% 3845 ± 63% sched_debug.cpu#35.sched_count
1030 ± 13% +79.1% 1845 ± 66% sched_debug.cpu#35.sched_goidle
2.00 ± 40% +37.5% 2.75 ± 82% sched_debug.cpu#46.nr_uninterruptible
907.11 ± 67% +403.7% 4569 ± 75% sched_debug.cpu#59.ttwu_count
-4.56 ±-41% -94.5% -0.25 ±-714% sched_debug.cpu#64.nr_uninterruptible
So you patch improved 11.9% from its base v4.3-rc7. I think other
difference are caused by other changes. Sorry for confusing.
Best Regards,
Huang, Ying
> Hugh
>
>>
>> =========================================================================================
>> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
>> gcc-4.9/performance/x86_64-rhel/debian-x86_64-2015-02-07.cgz/lkp-wsx02/creat-clo/aim9/300s
>>
>> commit:
>> c435a390574d012f8d30074135d8fcc6f480b484
>> afa2db2fb6f15f860069de94a1257db57589fe95
>> 43819159da2b77fedcf7562134d6003dccd6a068
>>
>> c435a390574d012f afa2db2fb6f15f860069de94a1 43819159da2b77fedcf7562134
>> ---------------- -------------------------- --------------------------
>> %stddev %change %stddev %change %stddev
>> \ | \ | \
>> 563556 ± 1% -12.5% 493033 ± 5% -5.6% 531968 ± 1% aim9.creat-clo.ops_per_sec
>> 11836 ± 7% +11.4% 13184 ± 7% +15.0% 13608 ± 5% numa-meminfo.node1.SReclaimable
>> 10121526 ± 3% -12.1% 8897097 ± 5% -4.1% 9707953 ± 4% proc-vmstat.pgalloc_normal
>> 9.34 ± 4% -11.4% 8.28 ± 3% -4.8% 8.88 ± 2% time.user_time
>> 3480 ± 3% -2.5% 3395 ± 1% -28.5% 2488 ± 3% vmstat.system.cs
>> 203275 ± 17% -6.8% 189453 ± 5% -34.4% 133352 ± 11% cpuidle.C1-NHM.usage
>> 8081280 ±129% -93.3% 538377 ± 97% +31.5% 10625496 ±106% cpuidle.C1E-NHM.time
>> 3144 ± 58% +619.0% 22606 ± 56% +903.9% 31563 ± 0% numa-vmstat.node0.numa_other
>> 2958 ± 7% +11.4% 3295 ± 7% +15.0% 3401 ± 5% numa-vmstat.node1.nr_slab_reclaimable
>> 45074 ± 5% -43.4% 25494 ± 57% -68.7% 14105 ± 2% numa-vmstat.node2.numa_other
>> 56140 ± 0% +0.0% 56158 ± 0% -94.4% 3120 ± 0% slabinfo.Acpi-ParseExt.active_objs
>> 1002 ± 0% +0.0% 1002 ± 0% -92.0% 80.00 ± 0% slabinfo.Acpi-ParseExt.active_slabs
>> 56140 ± 0% +0.0% 56158 ± 0% -94.4% 3120 ± 0% slabinfo.Acpi-ParseExt.num_objs
>> 1002 ± 0% +0.0% 1002 ± 0% -92.0% 80.00 ± 0% slabinfo.Acpi-ParseExt.num_slabs
>> 1079 ± 5% -10.8% 962.00 ± 10% -100.0% 0.00 ± -1% slabinfo.blkdev_ioc.active_objs
>> 1079 ± 5% -10.8% 962.00 ± 10% -100.0% 0.00 ± -1% slabinfo.blkdev_ioc.num_objs
>> 110.67 ± 39% +74.4% 193.00 ± 46% +317.5% 462.00 ± 8% slabinfo.blkdev_queue.active_objs
>> 189.33 ± 23% +43.7% 272.00 ± 33% +151.4% 476.00 ± 10% slabinfo.blkdev_queue.num_objs
>> 1129 ± 10% -1.9% 1107 ± 7% +20.8% 1364 ± 6% slabinfo.blkdev_requests.active_objs
>> 1129 ± 10% -1.9% 1107 ± 7% +20.8% 1364 ± 6% slabinfo.blkdev_requests.num_objs
>> 1058 ± 3% -10.3% 949.00 ± 9% -100.0% 0.00 ± -1% slabinfo.file_lock_ctx.active_objs
>> 1058 ± 3% -10.3% 949.00 ± 9% -100.0% 0.00 ± -1% slabinfo.file_lock_ctx.num_objs
>> 4060 ± 1% -2.1% 3973 ± 1% -10.5% 3632 ± 1% slabinfo.files_cache.active_objs
>> 4060 ± 1% -2.1% 3973 ± 1% -10.5% 3632 ± 1% slabinfo.files_cache.num_objs
>> 10001 ± 0% -0.3% 9973 ± 0% -61.1% 3888 ± 0% slabinfo.ftrace_event_field.active_objs
>> 10001 ± 0% -0.3% 9973 ± 0% -61.1% 3888 ± 0% slabinfo.ftrace_event_field.num_objs
>> 1832 ± 0% +0.4% 1840 ± 0% -100.0% 0.00 ± -1% slabinfo.ftrace_event_file.active_objs
>> 1832 ± 0% +0.4% 1840 ± 0% -100.0% 0.00 ± -1% slabinfo.ftrace_event_file.num_objs
>> 1491 ± 5% -2.3% 1456 ± 6% +12.0% 1669 ± 4% slabinfo.mnt_cache.active_objs
>> 1491 ± 5% -2.3% 1456 ± 6% +12.0% 1669 ± 4% slabinfo.mnt_cache.num_objs
>> 126.33 ± 19% +10.2% 139.17 ± 9% -100.0% 0.00 ± -1% slabinfo.nfs_commit_data.active_objs
>> 126.33 ± 19% +10.2% 139.17 ± 9% -100.0% 0.00 ± -1% slabinfo.nfs_commit_data.num_objs
>> 97.17 ± 20% -9.1% 88.33 ± 28% -100.0% 0.00 ± -1% slabinfo.user_namespace.active_objs
>> 97.17 ± 20% -9.1% 88.33 ± 28% -100.0% 0.00 ± -1% slabinfo.user_namespace.num_objs
>>
>> Best Regards,
>> Huang, Ying
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists