[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090317044454.GA28245@Krystal>
Date: Tue, 17 Mar 2009 00:44:54 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To: David Miller <davem@...emloft.net>
Cc: paulmck@...ux.vnet.ibm.com, mingo@...e.hu,
jwboyer@...ux.vnet.ibm.com, linux-kernel@...r.kernel.org,
ltt-dev@...ts.casi.polymtl.ca
Subject: Re: cli/sti vs local_cmpxchg and local_add_return
* David Miller (davem@...emloft.net) wrote:
> From: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
> Date: Tue, 17 Mar 2009 00:10:16 -0400
>
> > Thanks for running those tests. Actually, I did not expect good results
> > for sparc64 because the local_t primitives map to atomic_t. Looking at
> > sparc atomic_64.h, I notice that all atomic operations except cmpxchg
> > are done through function calls even when those functions only contain
> > few instructions. Is there any particular reason for that ? These
> > function calls can be quite costly. We could easily inline those.
>
> With all the memory barriers, cpu bug workarounds, et al.
> it's way too much to expand inline.
>
> > And to "unleash" the full power of local_t, we should see if there are
> > variants of the atomic operations which are safe only on UP and if there
> > are some memory barriers currently embedded in the atomic_t ops we could
> > remove in a local_t version. Actually, all the
> > BACKOFF_SETUP/BACKOFF_SPIN is specific to SMP, and therefore the local_t
> > version probably does not need that because it touches specifically
> > per-cpu data. That could give very interesting results.
> >
> > The reason why the results shows 0 cycles per loop is just because there
> > is less that a bus clock cycle per loop. But the total time (in bus
> > cycles) for the whole 20000 cycles gives us equivalent information.
>
> I don't think it's worth it. Rusty made similar tests not too long
> ago.
>
> IRQ disabling/enabling on sparc64 is 9 cycles (each) and the atomic
> operation on the other hand is at least 35 cycles.
OK, so sparc64 should probably implement local_t with interrupt
disabling on the local CPU and two atomic aligned operations (1 read, 1
write) of 64-bits variables from/to memory, so we make sure that if a
remote CPU tries to simply read the information, it is never seen as
corrupted.
Note that any code doing "remote reads" and "write expected to be read
from a remote cpu" on local_t variables must provide its own memory
barriers.
Mathieu
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists