[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <493E2884.6010600@cosmosbay.com>
Date: Tue, 09 Dec 2008 09:12:52 +0100
From: Eric Dumazet <dada1@...mosbay.com>
To: Peter Zijlstra <a.p.zijlstra@...llo.nl>
CC: Theodore Tso <tytso@....edu>,
Andrew Morton <akpm@...ux-foundation.org>,
linux kernel <linux-kernel@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>,
Mingming Cao <cmm@...ibm.com>, linux-ext4@...r.kernel.org
Subject: Re: [PATCH] percpu_counter: Fix __percpu_counter_sum()
Peter Zijlstra a écrit :
> On Mon, 2008-12-08 at 18:00 -0500, Theodore Tso wrote:
>> On Mon, Dec 08, 2008 at 11:20:35PM +0100, Peter Zijlstra wrote:
>>> atomic_t is pretty good on all archs, but you get to keep the cacheline
>>> ping-pong.
>>>
>> Stupid question --- if you're worried about cacheline ping-pongs, why
>> aren't each cpu's delta counter cacheline aligned? With a 64-byte
>> cache-line, and a 32-bit counters entry, with less than 16 CPU's we're
>> going to be getting cache ping-pong effects with percpu_counter's,
>> right? Or am I missing something?
>
> sorta - a new per-cpu allocator is in the works, but we do cacheline
> align the per-cpu allocations (or used to), also, the allocations are
> node affine.
>
I did work on a 'light weight percpu counter', aka percpu_lcounter, for
all metrics that dont need 64 bits wide, but a plain 'long'
(network, nr_files, nr_dentry, nr_inodes, ...)
struct percpu_lcounter {
atomic_long_t count;
#ifdef CONFIG_SMP
#ifdef CONFIG_HOTPLUG_CPU
struct list_head list; /* All percpu_counters are on a list */
#endif
long *counters;
#endif
};
(No more spinlock)
Then I tried to have atomic_t (or atomic_long_t) for 'counters', but got a
10% slow down of __percpu_lcounter_add(), even if never hitting the 'slow path'
atomic_long_add_return() is really expensiven, even on a non contended cache
line.
struct percpu_lcounter {
atomic_long_t count;
#ifdef CONFIG_SMP
#ifdef CONFIG_HOTPLUG_CPU
struct list_head list; /* All percpu_counters are on a list */
#endif
atomic_long_t *counters;
#endif
};
So I believe the percpu_clounter_sum() that tries to reset to 0 all cpu local
counts would be really too expensive, if it slows down _add() so much.
long percpu_lcounter_sum(struct percpu_lcounter *fblc)
{
long acc = 0;
int cpu;
for_each_online_cpu(cpu)
acc += atomic_long_xchg(per_cpu_ptr(fblc->counters, cpu), 0);
return atomic_long_add_return(acc, &fblc->count);
}
void __percpu_lcounter_add(struct percpu_lcounter *flbc, long amount, s32 batch)
{
long count;
atomic_long_t *pcount;
pcount = per_cpu_ptr(flbc->counters, get_cpu());
count = atomic_long_add_return(amount, pcount); /* way too expensive !!! */
if (unlikely(count >= batch || count <= -batch)) {
atomic_long_add(count, &flbc->count);
atomic_long_sub(count, pcount);
}
put_cpu();
}
Just forget about it and let percpu_lcounter_sum() only read the values, and
let percpu_lcounter_add() not using atomic ops in fast path.
void __percpu_lcounter_add(struct percpu_lcounter *flbc, long amount, s32 batch)
{
long count;
long *pcount;
pcount = per_cpu_ptr(flbc->counters, get_cpu());
count = *pcount + amount;
if (unlikely(count >= batch || count <= -batch)) {
atomic_long_add(count, &flbc->count);
count = 0;
}
*pcount = count;
put_cpu();
}
EXPORT_SYMBOL(__percpu_lcounter_add);
Also, with upcoming NR_CPUS=4096, it may be time to design a hierarchical percpu_counter,
to avoid hitting one shared "fbc->count" all the time a local counter overflows.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists