lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20090316.212717.233062381.davem@davemloft.net>
Date:	Mon, 16 Mar 2009 21:27:17 -0700 (PDT)
From:	David Miller <davem@...emloft.net>
To:	mathieu.desnoyers@...ymtl.ca
Cc:	paulmck@...ux.vnet.ibm.com, mingo@...e.hu,
	jwboyer@...ux.vnet.ibm.com, linux-kernel@...r.kernel.org,
	ltt-dev@...ts.casi.polymtl.ca
Subject: Re: cli/sti vs local_cmpxchg and local_add_return

From: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
Date: Tue, 17 Mar 2009 00:10:16 -0400

> Thanks for running those tests. Actually, I did not expect good results
> for sparc64 because the local_t primitives map to atomic_t. Looking at
> sparc atomic_64.h, I notice that all atomic operations except cmpxchg
> are done through function calls even when those functions only contain
> few instructions.  Is there any particular reason for that ? These
> function calls can be quite costly. We could easily inline those.

With all the memory barriers, cpu bug workarounds, et al.
it's way too much to expand inline.

> And to "unleash" the full power of local_t, we should see if there are
> variants of the atomic operations which are safe only on UP and if there
> are some memory barriers currently embedded in the atomic_t ops we could
> remove in a local_t version. Actually, all the
> BACKOFF_SETUP/BACKOFF_SPIN is specific to SMP, and therefore the local_t
> version probably does not need that because it touches specifically
> per-cpu data. That could give very interesting results.
> 
> The reason why the results shows 0 cycles per loop is just because there
> is less that a bus clock cycle per loop. But the total time (in bus
> cycles) for the whole 20000 cycles gives us equivalent information.

I don't think it's worth it.  Rusty made similar tests not too long
ago.

IRQ disabling/enabling on sparc64 is 9 cycles (each) and the atomic
operation on the other hand is at least 35 cycles.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ