[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1289830636.2607.70.camel@edumazet-laptop>
Date: Mon, 15 Nov 2010 15:17:16 +0100
From: Eric Dumazet <eric.dumazet@...il.com>
To: Christoph Lameter <cl@...ux.com>
Cc: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
David Miller <davem@...emloft.net>,
netdev <netdev@...r.kernel.org>,
Arnaldo Carvalho de Melo <acme@...radead.org>,
Ingo Molnar <mingo@...e.hu>, Andi Kleen <andi@...stfloor.org>,
Nick Piggin <npiggin@...nel.dk>
Subject: Re: [PATCH] atomic: add atomic_inc_not_zero_hint()
Le lundi 15 novembre 2010 à 07:57 -0600, Christoph Lameter a écrit :
> On Sat, 13 Nov 2010, Paul E. McKenney wrote:
>
> > On Fri, Nov 12, 2010 at 01:14:12PM -0600, Christoph Lameter wrote:
> > >
> > > prefetchw() would be too much overhead?
> >
> > No idea. Where do you believe that prefetchw() should be added?
>
> It is another way to get an exclusive cache line
> for situations like this. No need to give a hint.
>
Exclusive access ? As soon as another cpu takes it again, you lose.
Its not really the same thing... Maybe you miss the 'hint' intention at
all. We know the probable value of the counter, we dont want to read it.
In fact, prefetchw() is useful when you can assert it many cycles before
the memory read you are going to perform [before the write]. On
contended cache lines, its a waste, because by the time your cpu is
going to read memory, then perform the atomic compare_and_exchange(), an
other cpu might have dirtied the location again. This is what we noticed
during Netfilter Workshop 2010 : A high performance cost at both
atomic_read() and atomic_cmpxchg(). We tried prefetchw() and it was a
performance drop. It was with only 16 cpus contending on neighbour
refcnt, and 5 millions frames per second (5 millions atomic increments,
5 millions atomic decrements)
prefetchw() should be used on very specific spots, when a cpu is going
to write into a private area (not potentially accessed by other cpus).
We use it for example in __alloc_skb(), a bit before memset().
By the way, atomic_inc_not_zero_hint() is less code than
[prefetchw(), atomic_inc_not_zero()]. Using one instruction [cmpxchg]
with the memory pointer is better than three. [prefetchw(), read(),
cmpxchg()], particularly if you have high contention on cache line.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists