lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090323165009.GC22501@Krystal>
Date:	Mon, 23 Mar 2009 12:50:09 -0400
From:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To:	"Alan D. Brunelle" <Alan.Brunelle@...com>
Cc:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Ingo Molnar <mingo@...e.hu>,
	Josh Boyer <jwboyer@...ux.vnet.ibm.com>,
	linux-kernel@...r.kernel.org, ltt-dev@...ts.casi.polymtl.ca
Subject: Re: cli/sti vs local_cmpxchg and local_add_return

* Alan D. Brunelle (Alan.Brunelle@...com) wrote:
> Here are the results for:
> 
> processor  : 31
> vendor     : GenuineIntel
> arch       : IA-64
> family     : 32
> model      : 0
> model name : Dual-Core Intel(R) Itanium(R) 2 Processor 9050
> revision   : 7
> archrev    : 0
> features   : branchlong, 16-byte atomic ops
> cpu number : 0
> cpu regs   : 4
> cpu MHz    : 1598.002
> itc MHz    : 400.000000
> BogoMIPS   : 3186.68
> siblings   : 2
> physical id: 196865
> core id    : 1
> thread id  : 0
> 
> test init
> test results: time for baseline
> number of loops: 20000
> total time: 5002
> -> baseline takes 0 cycles
> test end
> test results: time for locked cmpxchg
> number of loops: 20000
> total time: 60083
> -> locked cmpxchg takes 3 cycles
> test end
> test results: time for non locked cmpxchg
> number of loops: 20000
> total time: 60002
> -> non locked cmpxchg takes 3 cycles
> test end
> test results: time for locked add return
> number of loops: 20000
> total time: 155007
> -> locked add return takes 7 cycles
> test end
> test results: time for non locked add return
> number of loops: 20000
> total time: 155004
> -> non locked add return takes 7 cycles
> test end
> test results: time for enabling interrupts (STI)
> number of loops: 20000
> total time: 45003
> -> enabling interrupts (STI) takes 2 cycles
> test end
> test results: time for disabling interrupts (CLI)
> number of loops: 20000
> total time: 59998
> -> disabling interrupts (CLI) takes 2 cycles
> test end
> test results: time for disabling/enabling interrupts (STI/CLI)
> number of loops: 20000
> total time: 107274
> -> enabling/disabling interrupts (STI/CLI) takes 5 cycles
> test end

Hi Alan,

Wow, disabling interrupts is incredibly cheap on the ia64, and
local_add_return especially costly. I think it's because it is done by
an underlying cmpxchg, and therefore not supported directly by the
architecture (except for the fetch add which is limited to very specific
values).

Given some ia64 code refers to NMIs, I guess this architecture supports
them. So in the end, the decision between speed and atomicity will
depend on a solidness vs speed tradeoff. But given the time it takes to
write data to memory, I think 5 cycles vs 10 cycles won't make a big
difference overall.

Thanks for those results !

Mathieu

> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ