linux-kernel - Re: [cpuops cmpxchg V2 3/5] irq_work: Use per cpu atomics instead of regular atomics

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1292433517.2708.41.camel@laptop>
Date:	Wed, 15 Dec 2010 18:18:37 +0100
From:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
To:	Christoph Lameter <cl@...ux.com>
Cc:	Tejun Heo <tj@...nel.org>, akpm@...ux-foundation.org,
	Pekka Enberg <penberg@...helsinki.fi>,
	linux-kernel@...r.kernel.org,
	Eric Dumazet <eric.dumazet@...il.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Subject: Re: [cpuops cmpxchg V2 3/5] irq_work: Use per cpu atomics instead
 of regular atomics

On Wed, 2010-12-15 at 11:04 -0600, Christoph Lameter wrote:

> Prefixes are faster than explicit address calculations. A prefix allows
> you to integrate the per cpu address calculation into an arithmetic
> operation.

Well, depends on how often you need that address I'd think. If you'd
have a per-cpu struct and need to frob lots of variables in that struct
it might be cheaper to simply compute the struct address once and then
use relative addresses than to prefix everything with %fs.

> A prefix is one byte which is less that multiple arithmetic operations to
> calculate an address.

I thought you'd only need a single arithmetic op to calculate the
address, anyway at some point those 1 byte prefixes will add up to more
than the ops saved.

In the current code you add 2 bytes (although you safe one from loosing
the LOCK prefix, but that could have been achieved by using
cmpxchg_local() as well. These 2 bytes are probably less than the
address computation for head (and not needing the head pointer again
saves on register pressure) so its probably a win here.

Still, non of this is really fast-path code, so I really wonder why
we're optimizing this over keeping the code obvious.

> I am not sure that the preempt_disable/enable is needed. They are just
> there because you had a get/put_cpu there.
> 
> If the code is run from hardirq context then preempt is already disabled.
> We can just drop those then.

Afaik the current callers are all from IRQ/NMI context, but I don't want
to mandate callers be from such contexts.

The problem is that we need to guarantee we raise the self-IPI on the
same cpu we queued the worklet on.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/