[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4B7C7023.7060602@zytor.com>
Date: Wed, 17 Feb 2010 14:39:31 -0800
From: "H. Peter Anvin" <hpa@...or.com>
To: Luca Barbieri <luca@...a-barbieri.com>
CC: mingo@...e.hu, a.p.zijlstra@...llo.nl, akpm@...ux-foundation.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 09/10] x86-32: use SSE for atomic64_read/set if available
On 02/17/2010 03:42 AM, Luca Barbieri wrote:
> This patch uses SSE movlps to perform 64-bit atomic reads and writes.
>
> According to Intel manuals, all aligned 64-bit reads and writes are
> atomically, which should include movlps.
>
> To do this, we need to disable preempt, clts if TS was set, and
> restore TS.
>
> If we don't need to change TS, using SSE is much faster.
>
> Otherwise, it should be essentially even, with the fastest method
> depending on the specific architecture.
>
> Another important point is that with SSE atomic64_read can keep the
> cacheline in shared state.
>
> If we could keep TS off and reenable it when returning to userspace,
> this would be even faster, but this is left for a later patch.
>
> We use SSE because we can just save the low part %xmm0, whereas using
> the FPU or MMX requires at least saving the environment, and seems
> impossible to do fast.
>
> Signed-off-by: Luca Barbieri <luca@...a-barbieri.com>
I'm a bit unhappy about this patch. It seems to violate the assumption
that we only ever use the FPU state guarded by
kernel_fpu_begin()..kernel_fpu_end() and instead it uses a local hack,
which seems like a recipe for all kinds of very subtle problems down the
line.
Unless the performance advantage is provably very compelling, I'm
inclined to say that this is not worth it.
-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists