[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <561579EA.60507@hpe.com>
Date: Wed, 07 Oct 2015 16:00:42 -0400
From: Waiman Long <waiman.long@....com>
To: Dave Chinner <david@...morbit.com>
CC: Tejun Heo <tj@...nel.org>,
Christoph Lameter <cl@...ux-foundation.org>,
linux-kernel@...r.kernel.org, xfs@....sgi.com,
Scott J Norton <scott.norton@....com>,
Douglas Hatch <doug.hatch@....com>
Subject: Re: [PATCH] percpu_counter: return precise count from __percpu_counter_compare()
On 10/06/2015 05:30 PM, Dave Chinner wrote:
> On Tue, Oct 06, 2015 at 01:33:21PM -0400, Waiman Long wrote:
> Yes, it may be, but that does not mean we should optimise for it.
> If you are doing filesystem scalability testing on small filesystems
> near capacity, then your testing methodology is needs fixing. Not
> the code.
>
>>>>> XFS trades off low overhead for fast path allocation with slowdowns
>>>>> as we near ENOSPC in allocation routines. It gets harder to find
>>>>> contiguous free space, files get more fragmented, IO takes longer
>>>>> because we seek more, etc. Hence we accept that performance slows
>>>>> down as as the need for precision increases as we near ENOSPC.
>>>>>
>>>>> I'd suggest you retry your benchmark with larger filesystems, and
>>>>> see what happens...
>>>> I don't think I am going to see the slowdown that I observed on
>>>> larger filesystems with more free space.
>>> So there is no problem that needs fixing.... ;)
>> Well, I am still worrying that corner cases when the slowpath is
>> triggered. I would like to make it perform better in those cases.
> It's a pretty damn small slowdown in your somewhat extreme,
> artificial test. Show me a real world production system that runs
> small fileystems permanently at>99% filesystem capacity, and them
> maybe vwe've got something that needs changing.
>
>>>> gauge how far it is from ENOSPC. So we don't really need to get
>>>> the precise count as long as number of CPUs are taken into
>>>> consideration in the comparison.
>>> I think you are looking in the wrong place. There is nothing
>>> wrong with XFS doing two compares here. If we are hitting the
>>> __percpu_counter_compare() slow path too much, then we should be
>>> understanding exactly why that slow path is being hit so hard so
>>> often. I don't see any analysis of the actual per-cpu counter
>>> behaviour and why the slow path is being taken so often....
>> I am thinking of making the following changes:
> No. Please test the change to the per-cpu counters that I suggested:
>
>>> /*
>>> * Aggregate the per-cpu counter magazines back into the global
>>> * counter. This avoids the need for repeated compare operations to
>>> * run the slow path when the majority of the counter value is held
>>> * in the per-cpu magazines. Folding them back into the global
>>> * counter means we will continue to hit the fast
>>> * percpu_counter_read() path until the counter value falls
>>> * completely within the comparison limit passed to
>>> * __percpu_counter_compare().
>>> */
>>> static s64 percpu_counter_aggregate(struct percpu_counter *fbc)
>>> {
>>> s64 ret;
>>> int cpu;
>>> unsigned long flags;
>>>
>>> raw_spin_lock_irqsave(&fbc->lock, flags);
>>> ret = fbc->count;
>>> for_each_online_cpu(cpu) {
>>> s32 count = __this_cpu_read(*fbc->counters);
>>> ret += count;
>>> __this_cpu_sub(*fbc->counters, count)
>>> }
>>> fbc->count = ret;
>>> raw_spin_unlock_irqrestore(&fbc->lock, flags);
>>> return ret;
>>> }
>> I don't think that will work as some other CPUs may change the
>> percpu counters values between percpu_counter_aggregate() and
>> __percpu_counter_compare(). To be safe, the precise counter has to
>> be compted whenever the comparison value difference is less than
>> nr_cpus * batch size.
> Well, yes. Why do you think the above function does the same
> function as percpu_counter_sum()? So that the percpu_counter_sum()
> call *inside* __percpu_counter_compare() can be replaced by this
> call. i.e.
>
> return -1;
> }
> /* Need to use precise count */
> - count = percpu_counter_sum(fbc);
> + count = percpu_counter_aggregate(fbc);
> if (count> rhs)
> return 1;
> else if (count< rhs)
>
> Please think about what I'm saying rather than dismissing it without
> first understanding my suggestions.
I understood what you were saying. However, the per-cpu counter isn't
protected by the spinlock. Reading it is OK, but writing may cause race
if that counter is modified by a CPU other than its owning CPU.
The slow performance of percpu_counter_sum() is due to its need to
access n different (likely cold) cachelines where n is the number of
CPUs in the system. So the larger the system, the more problematic it
will be. My main concern about xfs_mod_fdblocks() is that it can
potentially call percpu_counter_sum() twice which is what I want to
prevent. It is OK if you don't think that change is necessary. However,
I will come back if I find more evidence that this can be an issue.
Cheers,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists