[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dc7456d6-616d-78c5-0ac6-c5ffaf721e41@hisilicon.com>
Date: Tue, 5 Nov 2019 11:26:32 +0800
From: Shaokun Zhang <zhangshaokun@...ilicon.com>
To: Dave Chinner <david@...morbit.com>
CC: <linux-xfs@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
Yang Guo <guoyang2@...wei.com>,
"Darrick J. Wong" <darrick.wong@...cle.com>
Subject: Re: [PATCH] xfs: optimise xfs_mod_icount/ifree when delta < 0
Hi Dave,
On 2019/11/5 4:49, Dave Chinner wrote:
> On Mon, Nov 04, 2019 at 07:29:40PM +0800, Shaokun Zhang wrote:
>> From: Yang Guo <guoyang2@...wei.com>
>>
>> percpu_counter_compare will be called by xfs_mod_icount/ifree to check
>> whether the counter less than 0 and it is a expensive function.
>> let's check it only when delta < 0, it will be good for xfs's performance.
>
> Hmmm. I don't recall this as being expensive.
>
Sorry about the misunderstanding information in commit message.
> How did you find this? Can you please always document how you found
If user creates million of files and the delete them, We found that the
__percpu_counter_compare costed 5.78% CPU usage, you are right that itself
is not expensive, but it calls __percpu_counter_sum which will use
spin_lock and read other cpu's count. perf record -g is used to profile it:
- 5.88% 0.02% rm [kernel.vmlinux] [k] xfs_mod_ifree
- 5.86% xfs_mod_ifree
- 5.78% __percpu_counter_compare
5.61% __percpu_counter_sum
> the problem being addressed in the commit message so that we don't
> then have to ask how the problem being fixed is reproduced.
>
>> Cc: "Darrick J. Wong" <darrick.wong@...cle.com>
>> Signed-off-by: Yang Guo <guoyang2@...wei.com>
>> Signed-off-by: Shaokun Zhang <zhangshaokun@...ilicon.com>
>> ---
>> fs/xfs/xfs_mount.c | 6 ++++++
>> 1 file changed, 6 insertions(+)
>>
>> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
>> index ba5b6f3b2b88..5e8314e6565e 100644
>> --- a/fs/xfs/xfs_mount.c
>> +++ b/fs/xfs/xfs_mount.c
>> @@ -1174,6 +1174,9 @@ xfs_mod_icount(
>> int64_t delta)
>> {
>> percpu_counter_add_batch(&mp->m_icount, delta, XFS_ICOUNT_BATCH);
>> + if (delta > 0)
>> + return 0;
>> +
>> if (__percpu_counter_compare(&mp->m_icount, 0, XFS_ICOUNT_BATCH) < 0) {
>> ASSERT(0);
>> percpu_counter_add(&mp->m_icount, -delta);
>
> I struggle to see how this is expensive when you have more than
> num_online_cpus() * XFS_ICOUNT_BATCH inodes allocated.
> __percpu_counter_compare() will always take the fast path so ends up
> being very little code at all.
>
>> @@ -1188,6 +1191,9 @@ xfs_mod_ifree(
>> int64_t delta)
>> {
>> percpu_counter_add(&mp->m_ifree, delta);
>> + if (delta > 0)
>> + return 0;
>> +
>> if (percpu_counter_compare(&mp->m_ifree, 0) < 0) {
>> ASSERT(0);
>> percpu_counter_add(&mp->m_ifree, -delta);
>
> This one might have some overhead because the count is often at or
> around zero, but I haven't noticed it being expensive in kernel
> profiles when creating/freeing hundreds of thousands of inodes every
> second.
>
> IOWs, we typically measure the overhead of such functions by kernel
> profile. Creating ~200,000 inodes a second, so hammering the icount
> and ifree counters, I see:
>
> 0.16% [kernel] [k] percpu_counter_add_batch
> 0.03% [kernel] [k] __percpu_counter_compare
>
0.03% is just __percpu_counter_compare's usage.
> Almost nothing - it's way down the long tail of noise in the
> profile.
>
> IOWs, the CPU consumed by percpu_counter_compare() is low that
> optimisation isn't going to produce any measurable performance
> improvement. Hence it's not really something we've concerned
> ourselves about. The profile is pretty much identical for removing
> hundreds of thousands of files a second, too, so there really isn't
> any performance gain to be had here.
>
> If you want to optimise code to make it faster and show a noticable
> performance improvement, start by running kernel profiles while your
> performance critical workload is running. Then look at what the
> functions and call chains that consume the most CPU and work out how
> to do them better. Those are the places that optimisation will
> result in measurable performance gains....
Hmm, I have done it and I didn't describe this problem clearly, with this
patch, 5.78%(__percpu_counter_compare) will disappear. I will follow
your method and reduce unnecessary noise.
Thanks,
Shaokun
>
> Cheers,
>
> Dave.
>
Powered by blists - more mailing lists