linux-kernel - Re: [cpuops cmpxchg double V1 4/4] Lockless (and preemptless) fastpaths for slub

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4D08F21B.3000706@kernel.org>
Date:	Wed, 15 Dec 2010 17:51:39 +0100
From:	Tejun Heo <tj@...nel.org>
To:	Christoph Lameter <cl@...ux.com>
CC:	akpm@...ux-foundation.org, Pekka Enberg <penberg@...helsinki.fi>,
	linux-kernel@...r.kernel.org,
	Eric Dumazet <eric.dumazet@...il.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Subject: Re: [cpuops cmpxchg double V1 4/4] Lockless (and preemptless) fastpaths
 for slub

On 12/14/2010 06:48 PM, Christoph Lameter wrote:
> Use the this_cpu_cmpxchg_double functionality to implement a lockless
> allocation algorithm on arches that support fast this_cpu_ops.
> 
> Each of the per cpu pointers is paired with a transaction id that ensures
> that updates of the per cpu information can only occur in sequence on
> a certain cpu.
> 
> A transaction id is a "long" integer that is comprised of an event number
> and the cpu number. The event number is incremented for every change to the
> per cpu state. This means that the cmpxchg instruction can verify for an
> update that nothing interfered and that we are updating the percpu structure
> for the processor where we picked up the information and that we are also
> currently on that processor when we update the information.
> 
> This results in a significant decrease of the overhead in the fastpaths. It
> also makes it easy to adopt the fast path for realtime kernels since this
> is lockless and does not require the use of the current per cpu area
> over the critical section. It is only important that the per cpu area is
> current at the beginning of the critical section and at the end.
> 
> So there is no need even to disable preemption.
> 
> Test results show that the fastpath cycle count is reduced by up to ~ 40%
> (alloc/free test goes from ~140 cycles down to ~80). The slowpath for kfree
> adds a few cycles.
> 
> Sadly this does nothing for the slowpath which is where the main issues with
> performance in slub are but the best case performance rises significantly.
> (For that see the more complex slub patches that require cmpxchg_double)

The first two look good to me but I frankly don't have much idea about
the latter two.  Pekka, can you please ack those?  Alternatively, you
can later pull the percpu tree in and apply the allocator bits in your
tree, which I actually prefer.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/