lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 22 Dec 2011 08:05:01 -0800
From:	Tejun Heo <tj@...nel.org>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Christoph Lameter <cl@...ux.com>,
	Pekka Enberg <penberg@...nel.org>, Ingo Molnar <mingo@...e.hu>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [GIT PULL] slab fixes for 3.2-rc4

Hello, Linus.

On Wed, Dec 21, 2011 at 06:19:34PM -0800, Linus Torvalds wrote:
> On Wed, Dec 21, 2011 at 9:05 AM, Tejun Heo <tj@...nel.org> wrote:
> >
> > machines.  (cc'ing arch) Does anyone have better insight here?  How
> > much more expensive are local irq save/restore compared to inc/dec'ing
> > preempt count on various archs?
> 
> I think powerpc does sw irq disable, so it's pretty much the same.
> 
> However, on *most* architectures, if the stupid generic "disable
> interrupts and then do the op" is too expensive, they could easily
> just do an architecture-specific "safe" version. Especially if they
> only need to do cmpxchg and add.
> 
> Many architectures can do those safely with load-locked,
> store-conditional (ie atomic_add), and do so quickly. Sure, there are
> broken cases wher ethat is really slow (original alpha is an example
> of that), but I think it's fairly rare.
>
> So both arm (in v6+) and powerpc should be able to just use the
> "atomic_add" version, with no real downside.

Hmmm... right, if actual atomic ops are used we don't need to
guarantee that percpu address translation and operation happen on the
same CPU.  The end result would be correct regardless.  Atomic ops
wouldn't be as efficient as local single op as in x86 but it shouldn't
be too expensive when vast majority of operations are happening on
cachelines which are already exclusively owned.

> So I really suspect that we could just say: "make the irq-safe version
> be the *only* version", and no architecture will really care. Sure, it
> can be more expensive, but it usually isn't. Only when done badly and
> stupidly is it nasty.

Yeah, I don't think there's any cross-architecture way of making
this_cpu ops universally faster than explicit irq/preempt_disable -
get_addr - do one or more ops - irq/preempt_enable.  Many archs can't
even load address offset with single instruction after all.  Even on
x86, with enough number of ops, the segment prefix and full address on
each op would become more expensive than explicit sequence.  Given
that, I think just providing irqsafe version and let all other cases
use the explicit ones is a good tradeoff.

Alright, I'll do that for the 3.4 merge window.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ