[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <170fa0d20803111239u4d25b7c8p915e63a43cb389c6@mail.gmail.com>
Date: Tue, 11 Mar 2008 15:39:00 -0400
From: "Mike Snitzer" <snitzer@...il.com>
To: "Eric Dumazet" <dada1@...mosbay.com>
Cc: "David S. Miller" <davem@...emloft.net>,
"Andrew Morton" <akpm@...ux-foundation.org>,
"linux kernel" <linux-kernel@...r.kernel.org>,
netdev@...r.kernel.org, "Christoph Lameter" <clameter@....com>,
"Zhang, Yanmin" <yanmin_zhang@...ux.intel.com>,
"Peter Zijlstra" <a.p.zijlstra@...llo.nl>, stable@...nel.org
Subject: Re: [PATCH] alloc_percpu() fails to allocate percpu data
On 3/11/08, Eric Dumazet <dada1@...mosbay.com> wrote:
> Mike Snitzer a écrit :
>
> > On 2/21/08, Eric Dumazet <dada1@...mosbay.com> wrote:
> >
> >> Some oprofile results obtained while using tbench on a 2x2 cpu machine
> >> were very surprising.
> >>
> >> For example, loopback_xmit() function was using high number of cpu
> >> cycles to perform
> >> the statistic updates, supposed to be real cheap since they use percpu data
> >>
> >> pcpu_lstats = netdev_priv(dev);
> >> lb_stats = per_cpu_ptr(pcpu_lstats, smp_processor_id());
> >> lb_stats->packets++; /* HERE : serious contention */
> >> lb_stats->bytes += skb->len;
> >>
> >>
> >> struct pcpu_lstats is a small structure containing two longs. It appears
> >> that on my 32bits platform,
> >> alloc_percpu(8) allocates a single cache line, instead of giving to
> >> each cpu a separate
> >> cache line.
> >>
> >> Using the following patch gave me impressive boost in various benchmarks
> >> ( 6 % in tbench)
> >> (all percpu_counters hit this bug too)
> >>
> >> Long term fix (ie >= 2.6.26) would be to let each CPU allocate their own
> >> block of memory, so that we
> >> dont need to roudup sizes to L1_CACHE_BYTES, or merging the SGI stuffof
> >> course...
> >>
> >> Note : SLUB vs SLAB is important here to *show* the improvement, since
> >> they dont have the same minimum
> >> allocation sizes (8 bytes vs 32 bytes).
> >> This could very well explain regressions some guys reported when they
> >> switched to SLUB.
> >>
> >
> >
> > I see that this fix was committed to mainline as commit
> > be852795e1c8d3829ddf3cb1ce806113611fa555
> >
> > The commit didn't "Cc: <stable@...nel.org>", and it doesn't appear to
> > be queued for 2.6.24.x. Should it be?
> >
> >
>
> Yes, it should be queued fo 2.6.24.x
That means both of the following commits need to be cherry-picked into 2.6.24.x:
b3242151906372f30f57feaa43b4cac96a23edb1
be852795e1c8d3829ddf3cb1ce806113611fa555
> > If I understand you correctly, SLAB doesn't create this particular
> > cache thrashing on 32bit systems? Is SLAB ok on other architectures
> > too? Can you (or others) comment on the importance of this fix
> > relative to x86_64 (64byte cacheline) and SLAB?
> >
> >
>
>
> Fix is important both for 32 and 64 bits kernels, SLAB or SLUB.
>
> SLAB does have this problem, but less prevalent than SLUB, because these
> allocators dont have the same minimal size allocation (32 vs 8)
>
> So with SLUB, it is possible that 8 CPUS share the same 64 bytes
> cacheline to store their percpu counters, while only 2 cpus can share
> this same cache line with SLAB allocator.
Thanks for the clarification.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists