lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1201041046290.29775@router.home>
Date:	Wed, 4 Jan 2012 11:00:34 -0600 (CST)
From:	Christoph Lameter <cl@...ux.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
cc:	Tejun Heo <tj@...nel.org>, Pekka Enberg <penberg@...nel.org>,
	Ingo Molnar <mingo@...e.hu>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [GIT PULL] slab fixes for 3.2-rc4

On Wed, 4 Jan 2012, Linus Torvalds wrote:

> On Wed, Jan 4, 2012 at 7:30 AM, Christoph Lameter <cl@...ux.com> wrote:
> >
> > As mentioned before the main point of the use of these operations (in the
> > form of __this_cpu_op_return) when the cpu is pinned is to reduce the
> > number of instructions. __this_cpu_add_return allows replacing ~ 5
> > instructions with one.
>
> And that's fine if it's something really core, and something *so*
> important that you can tell the difference between one instruction and
> three.
>
> Which isn't the case here. In fact, on many (most?) x86
> microarchitectures xadd is actually slower than a regular
> add-from-memory-and-store - the big advantage of it is that with the
> "lock" prefix you do get special atomicity guarantees, and some
> algorithms (is semaphores) do want to know the value of the add
> atomically in order to know if there were other things going on.

xadd is 3 cycles. add is one cycle.

What we are doing here is also the use of a segment override to allow us
to relocate the per cpu address to the current cpu. So we are already
getting two additions for the price of one xadd. If we manually calcualte
the address then we have another memory reference to get the per cpu
offset for this processor (otherwise we get it from the segment register).
And then we need to store the results. We use registers etc etc.

Cannot imagine that this would be the same speed.

The thing is, I care about maintainability and not having
> cross-architecture problems etc. And right now many of the cpulocal
> things are *much* more of a maintainability headache than they are
> worth.

The cpu local things and xadd support have been around for a pretty long
time in various forms and they work reliably. I have tried to add onto
this by adding the cmpxchg/cmpxchg_double functionalty which caused some
issues because of the fallback stuff. That seems to have been addressed
though since we were willing now to make the preempt/irq tradeoff that we
were not able to get agreement on during the cleanup of the old APIs a
year or so ago.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ