lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <561579EA.60507@hpe.com> Date: Wed, 07 Oct 2015 16:00:42 -0400 From: Waiman Long <waiman.long@....com> To: Dave Chinner <david@...morbit.com> CC: Tejun Heo <tj@...nel.org>, Christoph Lameter <cl@...ux-foundation.org>, linux-kernel@...r.kernel.org, xfs@....sgi.com, Scott J Norton <scott.norton@....com>, Douglas Hatch <doug.hatch@....com> Subject: Re: [PATCH] percpu_counter: return precise count from __percpu_counter_compare() On 10/06/2015 05:30 PM, Dave Chinner wrote: > On Tue, Oct 06, 2015 at 01:33:21PM -0400, Waiman Long wrote: > Yes, it may be, but that does not mean we should optimise for it. > If you are doing filesystem scalability testing on small filesystems > near capacity, then your testing methodology is needs fixing. Not > the code. > >>>>> XFS trades off low overhead for fast path allocation with slowdowns >>>>> as we near ENOSPC in allocation routines. It gets harder to find >>>>> contiguous free space, files get more fragmented, IO takes longer >>>>> because we seek more, etc. Hence we accept that performance slows >>>>> down as as the need for precision increases as we near ENOSPC. >>>>> >>>>> I'd suggest you retry your benchmark with larger filesystems, and >>>>> see what happens... >>>> I don't think I am going to see the slowdown that I observed on >>>> larger filesystems with more free space. >>> So there is no problem that needs fixing.... ;) >> Well, I am still worrying that corner cases when the slowpath is >> triggered. I would like to make it perform better in those cases. > It's a pretty damn small slowdown in your somewhat extreme, > artificial test. Show me a real world production system that runs > small fileystems permanently at>99% filesystem capacity, and them > maybe vwe've got something that needs changing. > >>>> gauge how far it is from ENOSPC. So we don't really need to get >>>> the precise count as long as number of CPUs are taken into >>>> consideration in the comparison. >>> I think you are looking in the wrong place. There is nothing >>> wrong with XFS doing two compares here. If we are hitting the >>> __percpu_counter_compare() slow path too much, then we should be >>> understanding exactly why that slow path is being hit so hard so >>> often. I don't see any analysis of the actual per-cpu counter >>> behaviour and why the slow path is being taken so often.... >> I am thinking of making the following changes: > No. Please test the change to the per-cpu counters that I suggested: > >>> /* >>> * Aggregate the per-cpu counter magazines back into the global >>> * counter. This avoids the need for repeated compare operations to >>> * run the slow path when the majority of the counter value is held >>> * in the per-cpu magazines. Folding them back into the global >>> * counter means we will continue to hit the fast >>> * percpu_counter_read() path until the counter value falls >>> * completely within the comparison limit passed to >>> * __percpu_counter_compare(). >>> */ >>> static s64 percpu_counter_aggregate(struct percpu_counter *fbc) >>> { >>> s64 ret; >>> int cpu; >>> unsigned long flags; >>> >>> raw_spin_lock_irqsave(&fbc->lock, flags); >>> ret = fbc->count; >>> for_each_online_cpu(cpu) { >>> s32 count = __this_cpu_read(*fbc->counters); >>> ret += count; >>> __this_cpu_sub(*fbc->counters, count) >>> } >>> fbc->count = ret; >>> raw_spin_unlock_irqrestore(&fbc->lock, flags); >>> return ret; >>> } >> I don't think that will work as some other CPUs may change the >> percpu counters values between percpu_counter_aggregate() and >> __percpu_counter_compare(). To be safe, the precise counter has to >> be compted whenever the comparison value difference is less than >> nr_cpus * batch size. > Well, yes. Why do you think the above function does the same > function as percpu_counter_sum()? So that the percpu_counter_sum() > call *inside* __percpu_counter_compare() can be replaced by this > call. i.e. > > return -1; > } > /* Need to use precise count */ > - count = percpu_counter_sum(fbc); > + count = percpu_counter_aggregate(fbc); > if (count> rhs) > return 1; > else if (count< rhs) > > Please think about what I'm saying rather than dismissing it without > first understanding my suggestions. I understood what you were saying. However, the per-cpu counter isn't protected by the spinlock. Reading it is OK, but writing may cause race if that counter is modified by a CPU other than its owning CPU. The slow performance of percpu_counter_sum() is due to its need to access n different (likely cold) cachelines where n is the number of CPUs in the system. So the larger the system, the more problematic it will be. My main concern about xfs_mod_fdblocks() is that it can potentially call percpu_counter_sum() twice which is what I want to prevent. It is OK if you don't think that change is necessary. However, I will come back if I find more evidence that this can be an issue. Cheers, Longman -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists